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Abstract: We address the problem of adaptive minimax density estimation on R'' with Lp- 
loss on the anisotropic Nikol'skii classes. We fully characterize behavior of the minimax risk 
for different relationships between regularity parameters and norm indexes in definitions of the 
functional class and of the risk. In particular, we show that there are four different regimes 
with respect to the behavior of the minimax risk. We develop a single estimator which is 
(nearly) optimal in order over the complete scale of the anisotropic Nikol'skii classes. Our 
estimation procedure is based on a data-driven selection of an estimator from a fixed family 
of kernel estimators. 
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1. Introduction 

Let Xi, . . . ,Xn be independent copies of random vector X S M*^ having density / with respect 
to the Lebesgue measure. We want to estimate / using observations X^*^) = By 
estimator we mean any X^^^-measurable map / : M" — >• Lp(M''). Accuracy of an estimator / is 
measured by the Lp-risk 

7^(")[/,/]:=(E;||/-/||^)'^^ pG[l,oo), 

where Ej denotes expectation with respect to the probabihty measure IP/ of the observations 
X^"^ = {Xi, . . . ,Xn), and || • \\p, p G [1, 00), is the Lp-norm on M^. The objective is to construct an 
estimator of / with smah Lp-risk. 

In the framework of the minimax approach density / is assumed to belong to a functional class 
S, which is specified on the basis of prior information on /. Given a functional class S, a natural 
accuracy measure of an estimator / is its maximal Lp-risk over S, 

7^(")[/;S]= sup [/,/]. 
/es 

The main question is: 
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(i) how to construct a rate-optimal, or optimal in order, estimator such that 

/ 

Here the infimum is taken over aU possible estimators. We refer to the outhned problem as the 
problem of minimax density estimation with I^p-loss on the class S. 

Although the minimax approach provides a fair and convenient criterion for comparison between 
different estimators, it lacks some flexibility. Typically S is a class of functions that is determined 
by some hyper-parameter, say, a. (We write S = in order to indicate explicitly dependence of 
the class T, on the corresponding hyper- parameter a.) In general, it turns out that an estimator 
which is optimal in order on the class is not optimal on the class S^,/. This fact motivates the 
following question: 

(ii) is it possible to construct an estimator that is optimal in order on some scale of functional 
classes {Sq,q; £ A} and not only on one class Sq,? In other words, is it possible to construct 
an estimator /* such that for any a £ A one has 

7e(")[/.;S,] >c0„(S„), n^oo? 

We refer to this question as the problem of adaptive minimax density estimation on the scale of 
classes {T,a,a £ A}. 

The minimax and adaptive minimax density estimation with Lp-loss is a subject of the vast lit- 
erature, see for example Bretagnolle and Huber (1979), Ibragimov and Khasminskii (1980, 1981), 
Devroye and Gyorfi (1985), Devroye and Lugosi (1996, 1997, 2001), Efroimovich (1986, 2008), Hasminskii and Ibra 
(1990), Donoho et al. (1996), Golubev (1992), Kerkyacharian, Picard and Tribouley (1996), Rigollet 
(2006), Massart (2007) [Chapter 7], Samarov and Tsybakov (2007), Rigollet and Tsybakov (2007) 
and Birge (2008). It is not our aim here to provide a complete review of the literature on density es- 
timation with Lp-loss. Below we will only discuss results that are directly related to our study. First 
we review papers dealing with the one-dimensional setting; then we proceed with the multivariate 
case. 

The problem of minimax density estimation on M} with Lp-loss, p G [2,oo), was studied by 
Bretagnolle and Huber (1979). In this paper the functional class S is the class of all densities such 
that [!!/(« 

llpll/llp/2] ^ L < 00, where /('^^ is the generalized derivative of order /3. It was 

shown there that 

(/.^(S) xn~5+W, VpG [2,00). 

Note that the same parameter p appears in the definitions of the risk and of the functional class. 

The problem of adaptive minimax density estimation on a compact interval of M} with Lp-loss 
was addressed in Donoho et al. (1996). In this paper class S is the Besov functional class B^g(L), 
where parameter /3 stands for the regularity index, and r is the index of the norm in which the 
regularity is measured. It is shown there that there is an elbow in the rates of convergence for 
the minimax risk according to whether p < r(2/3 + 1) (called in the literature the dense zone) or 
p > r(2/3 + 1) (the sparse zone). In particular, 

, / p<r(2/3 + l), 

[ (lnn/n)i-V(^'-)+i/(2/3), p>r(2/3+l). 

Donoho et al. (1996) develop a wavelet-based hard-thresholding estimator that achieves the indi- 
cated rates (up to a Inn-factor in the dense zone) for a scale of the Besov classes B(^g(L) under 
additional assumption /3r > 1. 
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It is quite remarkable that if the assumption that the underlying density has compact support is 
dropped, then the minimax risk behavior becomes completely different. Specifically, Juditsky and Lambert-Lacroix 
(2004) studied the problem of adaptive minimax density estimation on with S being the Holder 
class Noo,i{P, L). Their results are in striking contrast with those of Donoho et al. (1996): it is 
shown that 

[ n i+V/3, 1 <p< 2 + 1//3. 

Juditsky and Lambert-Lacroix (2004) develop a wavelet-based estimator that achieves the indi- 
cated rates up to a logarithmic factor on a scale of the Holder classes. Note that the aforementioned 
results of Donoho et al. (1996) applied to the Holder class, r = oo, yield the rate n~^/(^+^/^) for 
any p > 1. Thus, the rate corresponding to the zone 1 < p < 2 + 1//3, does not appear in the case 
of compactly supported densities. 

In a recent paper, Reynaud-Bouret et al. (2011) consider the problem of adaptive density esti- 
mation on with L2-losses on the Besov classes B^g(L). It is shown there that 

n^^, 2/(2/3 + 1) < r < 2, 



<An((Bf,(L)) > 
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n , r > 2. 



They also proposed a wavelet-based estimator that achieves the indicated rates up to a logarith- 
mic factor for a scale of Besov classes under additional assumption 2^r > 2 — r. It follows from 
Donoho et al. (1996) that if p = 2 and the density is compactly supported then the corresponding 
rates are (/>j^(Xl) >c n '^l^'^^l^^ for all r ^ 1/{/2p -\- 1). Hence the rate corresponding to the zone 
r > 2, p = 2, does not appear in the case of the compactly supported densities. 

As for the multivariate setting, Ibragimov and Khasminskii in a series of papers [Ibragimov and Khasminskii 
(1980, 1981), and Hasminskii and Ibragimov (1990)] studied the problem of minimax density esti- 
mation with Lp-loss on W^. Together with some classes of infinitely differentiable densities, they con- 
sidered the anisotropic Nikolskii's classes E = N^^(^(/3, L) , where /3 = (/3i, . . . , /3rf), r = (ri, . . . , r^) 
and L = (Li, . . . , L^) (for the precise definition see Section 3.1). It was shown that \{ ri= p for all 
i = 1, . . . , d then 

*„(N,.(AL))x|""™^' (L2) 

[ n 2+i/,3^ p (z [2,oo). 

Here (5 is the parameter defined by the relation 1//3 = X]j=i It should be stressed that in the 
cited papers the same norm index p is used in the definitions of the risk and of the functional class. 
We also refer to the recent paper by Mason (2009), where further discussion of these results can be 
found. 

Delyon and Juditsky (1996) generalized the results of Donoho et al. (1996) to the minimax den- 
sity estimation on a bounded interval of M'^, d> 1 over a collection of the isotropic Besov classes. 
In particular, they showed that the minimax rates of convergence given by (1.1) hold with l/(/3r) 
and 1//3 replaced by d/{(3r) and d/f3 respectively. Comparing rates in (1.2) with the asymptotics 
of minimax risk found in Delyon and Juditsky (1996) with r = p we conclude that the rate in (1.2) 
in the zone p £ [1,2) does not appear for compactly supported densities. 

Recently Goldenshluger and Lepski (2011b) developed an adaptive minimax estimator over a 
scale of classes N^^rf(/3, L); in particular, if = p for all i = 1, . . . , d then 



w ^ n 2+1/(3 n > 2 

[ n i-i/(/3p)+i//3, p E (1,2). 



Note that in the considered setting the norm indexes in the definitions of the risk and the functional 
class coincide. 

The results discussed above show that there is an essential difference between the problems of 
density estimation on the whole space and on a compact interval. The literature on density estima- 
tion on the whole space is quite fragmented, and relationships between aforementioned results are 
yet to be understood. These relationships become even more complex and interesting in the multi- 
variate setting where the density to be estimated belongs to a functional class with anisotropic and 
inhomogeneous smoothness. The problem of minimax estimation under Lp-loss over homogeneous 
Sobolev Lg-balls {q ^ p) was initiated in Nemirovski (1985) in the regression model on the unit 
cube of M"^. For the first time, functional classes with anisotropic and inhomogeneous smoothness 
were considered in Kerkyacharian et al. (2001, 2008) for the Gaussian white model model on a 
compact subset of W^. In the density estimation model Akakpo (2012) studied the case p = 2 and 
considered compactly supported densities on [0, 1]^. 

To the best of our knowledge, the problem of estimating a multivariate density from anisotropic 
and inhomogeneous functional classes on M"^ was not considered in the literature. This problem 
is a subject of the current paper. Our results cover the existing ones and generalize them in the 
following directions. 

We fully characterize behavior of the minimax risk for all possible relationships between regu- 
larity parameters and norm indexes in the definition of the functional classes and of the risk. In 
particular, we discover that there are four different regimes with respect to the minimax rates of 
convergence: tail, dense and sparse zones, and the last zone, in its turn, is subdivided in two regions. 
Existence of these regimes is not a consequence of the multivariate nature of the problem or the 
considered functional classes; in fact, these regimes appear already in the dimension one. Thus our 
results reveal all possible zones with respect to the rates of convergence in the problem of density 
estimation on M"^ and explain different results on rates of convergence in the existing literature. 
In particular, results in Juditsky and Lambert-Lacroix (2004) and Reynaud-Bouret et al. (2011) 
pertain to the rates of convergence in the tail and dense zones, while those in Donoho et al. (1996) 
and Delyon and Juditsky (1996) correspond to the dense zone and to a subregion of the sparse 
zone. 

We propose an estimator that is based upon a data-driven selection from a family of kernel esti- 
mators, and establish for it a pointwise oracle inequality. Then we use this inequality for derivation 
of bounds on the Lp-risk over a collection of the Nikol'skii functional classes. Since the construction 
of our estimator does not use any prior information on the class parameters, it is adaptive minimax 
over a scale of these classes. Moreover, we believe that the method of deriving Lp-risk bounds from 
pointwise oracle inequalities employed in the proof of Theorem 2 is of interest in its own right. It 
is quite general and can be applied to other nonparametric estimation problems. 

Another issue studied in the present paper is related to the existence of the tail zone. This zone 
does not exist in the problem of estimating compactly supported densities. Then a natural question 
arises: what is a general condition on / which ensures the same asymptotics of the minimax risk 
on M.'^ as in the case of compactly supported densities? We propose a tail dominance condition and 
show that, in a sense, it is the weakest possible condition under which the tail zone disappears. We 
also show that this condition guarantees existence of a consistent estimator under Li-loss. Recall 
that smoothness alone is not sufficient in order to guarantee consistency of density estimators in 
Li(M"') [see Ibragimov and Khasminskii (1981)]. 

The paper is structured as follows. In Section 2 we define our estimation procedure and derive 
the corresponding pointwise oracle inequality. Section 3 presents upper and lower bounds on the 
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minimax risk. We also discuss the obtained results and relate them to the existing results in the 
literature. The same estimation problem under the tail dominance condition is studied in Sec- 
tion 4. Sections 5-7 contain proofs of Theorems 1-4; proofs of auxiliary results are relegated to 
Appendices A and B. 

The following notation and conventions are used throughout the paper. For vectors u,v M"^ the 
operations u/v, uV v, uAv and inequalities such as u < v are all understood in the coordinate-wise 
sense. For instance, uV v = {ui V ui, . . . , it^ V v^)- All integrals are taken over M"^ unless the domain 
of integration is specified explicitly. For a Borel set ^ C M*^ symbol |^| stands for the Lebesgue 
measure of A; if ^ is a finite set, |^| denotes the cardinality of A. 

2. Estimation procedure and pointwise oracle inequality 

In this section we define our estimation procedure and derive an upper bound on its pointwise risk. 
2.1. Estimation procedure 

Our estimation procedure is based on data-driven selection from a family of kernel estimators. The 
family of estimators is defined as follows. 

2.1.1. Family of kernel estimators 

Let K : [-1/2, 1/2^ ^ be a fixed kernel such that K G C(M1), / K(x)dx = 1, and ||K||oo < oo. 
Let 

'H = [h = {hi, ...,hd)e (0, 1]'^ : hj = 2-^\kj = 0, . . . , log2 n, i = 1, . . . , d}; 

without loss of generality we assume that log2 n is integer. 

Given a bandwidth h £ T-L, define the corresponding kernel estimator of / by the formula 

i=l \ / 1=1 

where := YYj=i ^h{') '■= {^/'^h)K{' /h). Consider the family of kernel estimators 

H^) ■.= {fh,hen}. 

The proposed estimation procedure is based on data-driven selection of an estimator from T{T-L). 

2.1.2. Auxiliary estimators 

Our selection rule uses auxiliary estimators that are constructed as follows. For any pair h,r] £ T-L 
define the kernel * by the formula [Kf^ * Kj^]{t) = J Kh{t — y)Kri{y)dy- Let fh,r){x) denote 
the estimator associated with this kernel: 

1 " 

i=l 
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[Kh*K,]it) = J-Q^^\, (2.2) 



The following representation of kernels Kh^^i will be useful: for any h,r] €z Ti 

where function Qh,ri is given by the formula 

Qh,rj{t) = J K{v{y,t-uy))K{v{t-i^y,y))dy, ■= (2-3) 
Here function v : M'' x M'^ — M'^ is defined by 

The representation (2.2)-(2.3) is obtained by a straightforward change of variables in the con- 
volution integral [see the proof of Lemma 12 in Goldenshluger and Lepski (2011a)]. We also note 
that snpp{Qh^^) C [—1,1]*^, and ||<5/i,,,||cx) < 11-^11^ for all h,r]. In the special case where K{t) = 
Y[i=i k{ti) for some univariate kernel k : [—1/2, 1/2] — )• we have that 

d „ 

Qh,rj{t) = n / ~ i'iUi)k{ui)dui, n = {hi A rji)/{hi V r/). 
1=1 •' 

We also define 

Q{t) = sup j K{v{y,t-vy))K[v{t-vy,y))(iy , 
and note that supp((5) C [-1, 1]"^, and HQIjoo < ll-f^'llL- 
2.1.3. Stochastic errors of kernel estimators and their majorants 

Uniform moment bounds on stochastic errors of kernel estimators fh{x) and fh,r}{x) will play an 
important role in the construction of our selection rule. Let 

n 

^h{x) = - V Kh{X, -x)- Kh{t - x)f{t)dt, (2.4) 

nfrt J 
1 " r 

^h,vi^) = Kh,v{Xi -X)- Kh,r^it - X)f{t)dt 

^ i=l •' 

denote the stochastic errors of fh and fh^ri respectively. In order to construct our selection rule we 
need to find uniform upper bounds {majorants) on and ^/j^^, i.e. we need to find functions Mh 
and Mfi^r} such that moments of random variables 

sup [\ih{x)\ - Mh{x)] sup [\ih,r^{x)\ - Mh,r^{x)] 

hen h,7j£'H 

are "small" for each x E M'^. We will be also interested in the integrability properties of these 
moments. 
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It turns out that the majorants Mh{x) and Mh^^(x) can be defined in the following way. For a 
function g let 



Ah{g,x)= \gh{t-x)\f{t)dt, gf,{.) = V^'g{- /h), hen. 



Now define 



Mhig,x) 



xAh{g,x) Inn ^ xlnn 



nVh 



nVh 



(2.5) 



(2.6) 



where x is a positive constant to be specified. In Lemma 2 in Section 5 we show that under 
appropriate choice of parameter x functions 



Mh{x) := Mh{K,x), Mh,n{x) := Mhvr^{Q,x) 



(2.7) 



uniformly majorate and ^h,ri- 

It should be noted, however, that functions Mh{x) and M/j ^^(x) given by (2.7) cannot be directly 
used in construction of the selection rule because they depend on unknown density / to be estimated. 
We will use empirical counterparts of Mh{x) and Mh^^{x) instead. 

For g -.W^ we let 

1 " 

Ah{g,x) = - \gh{Xi - x)|, 



i=l 



and define 



Mhig, x) = 4W '^^^ + 



nVh 



(2. 



2.1.4- Selection rule and final estimator 

Now we are in a position to define our selection rule. For every x € let 



Rh{x) 



sup 



\fh,n{x) - Ux)\ - Mhy^{Q,x) - M^{K,x) 
+ sup Mn{Q,x) + Mh{K,x) 

T)>h 



hen. 



The selected bandwidth h{x) and the corresponding estimator are defined by 

h{x) = aigmi^Rh{x), f{x) = ff^^^^{x), xeW^. 



(2i 



(2.10) 



Note that the estimation procedure is completely determined by the family of kernel estimators 
Tin) and by the constant x appearing in the definition of M/j. 

We have to ensure that the map x i— ?• f^^-^{x) is an X^") -measurable Borel function. This follows 
from continuity of K and the fact that "H is a discrete set; for details see Appendix A, Section A.l. 
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2.2. Pointwise oracle inequality 

Let Bh{f,t) be the bias of the kernel estimator fh{t), 

Bhif, t) = J Kr,iy - t)f{y)dy - fit), (2.11) 



and define 

Bhif,x) = \Bh{f,x)\ V sup / Kr,{t-x)Bh{f,t)dt . (2.12) 

ri£H J 

Theorem 1. For any x G M"^ one has 

\f{x)-f{x)\ < inf {ABh{f,x) + QOsnp M^{Q,x) + 6lMh{K,x)} + 7C{x) + 18x(x), (2.13) 
hen v>h 

where 

Cix) := sup[Mx)\-MhiK,x)]+ V sup Uh,r,ix)\ - Mhv^^{Q,x)]+, (2.14) 

X(x) := max sup [\Ah{g,x) - Ah{g,x)\ - Mh{g,x)] . (2.15) 
ge{K,Q} hen 

Furthermore, for any q > I if >c > [||/^||oo V l]^[(4(i + 2)q + 4((i + 1)] then 

y"E/{[C(x)]'? + [x(x)]«}dx < Cn-«/^ Vn>3, (2.16) 

where C is the constant depending on d, q and ||A'||oo only. 

We remark that Theorem 1 does not require any conditions on the estimated density /. 

3. Adaptive estimation over anisotropic Nikol'skii classes 

In this section we study properties of the estimator defined in (2.9)-(2.10). The pointwise oracle 
inequality of Theorem 1 is the key technical tool for bounding Lp-risk of this estimator on the 
anisotropic Nikol'skii classes. 

3.1. Anisotropic Nikol'skii classes 

Let (ei, . . . , Cd) denote the canonical basis of M.'^. For function (7 : M'^ — t- and real number n G M 
define the first order difference operator with step size u in direction of the variable xj by 

Anjfi'(x) = g{x + ucj) - g{x), j = l,...,d. 

By induction, the k-th order difference operator with step size u in direction of the variable xj is 
defined as ^ 

At^^gix) = A^jA'-'gix) = ^(-1)'+^ (^^) A„,,,-5(x). (3.1) 

1=1 ^ ^ 



Definition 1. For given real numbers f = (ri, . . . ,rrf), rj G [l,oo], f3 = . . . f3j > 0, and 

L = (Li, . . . , Ld), Lj > 0, j = 1, . . . ,d, we say that function g : Mf^ — )■ belongs to the anisotropic 
Nikol'skii class N^^d(/3,L) if 

(i) \\g\\rj < Lj for all j = 1,..., d; 

(ii) for every j = 1 , . . . , c? there exists natural number kj > 13 j such that 



(3.2) 



The condition that for every j = there exists kj > (3j such that (3.2) holds can be 

replaced by the condition that (3.2) holds for every k > /3j, j = l,...,d; see, (Nikol'skii 1977, 
Section 4.3.3). 

The anisotropic Nikol'skii class is a specific case of the anisotropic Besov class, often encoun- 
tered in the nonparametric estimation literature. In particular, N^^(i(/3, •) = '^rl','.'.'.',ra;oo,...,oo{-), see 
(Nikol'skii 1977, Section 4.3.4). 



3.2. Construction of kernel K 

We will use the following specific kernel K in the definition of the family J-{T-L) [see, e.g., Kerkyacharian et al. 
(2001) or Goldenshluger and Lepski (2011b)]. 

Let I be an integer number, and let w : [— 1/(2£), 1/(2-^)] — )• be a function satisfying 
J w{y)dy = 1, and w G C(R^). Put 

My) = E C) i-^y-^'-wi-) , m = n Mtj), t = (ti, . . . , u). (3.3) 

The kernel K constructed in this way is bounded, supported on [—1/2, 1/2]*^, belongs to C(M'^) and 
satisfies 

j K{t)dt = 1' y K{t)t^dt = 0, V|A:| = 1, 1, 

where k = [ki, . . . , k^) is the multi-index, ki > 0, \k\ = ki + ■ ■ ■ + k^, and t^ = t'l^ ■ ■ ■ t^'^ for 
t = {ti,. . . ,td). 



3.3. Main results 

Let Nf?^rf(/3,L) be the anisotropic Nikol'skii functional class. Put 

^ d ^ ^ d ^ d 

«-=E«"' 7-=E«^' ^^'■=X\^i 
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and define 



I-I/V 2+1//3 
1-1/S+1//3' P l+l/s ' 

2^' ^<P<.(2 + l//3), ^^^^^ 



s/p, p> s(2 + l//3), s < 1, 

p > s(2 + 1//3), s > 1, 



l-l/s+l/(p/3) 
2-2/S+1//3 ' 



(Inn)i/P, pG{^, s(2 + l//3)}, 
1, otherwise. 



In contrast to Theorem 1 proved over the set of ah probabihty densities, the adaptive results 
presented below require the additional assumption: the estimated density should be uniformly 
bounded. For this purpose we define for any M > 

nr-d0,L,M) := N,-,d(/3,L) n {/ : ||/||oo < M} . 

Note, however if J := {j = 1, . . . , d : rj = oo} then N^,d(/3, L,M) = N^,d(/3, L) with M = inf j Lj. 
Moreover, in view of the embedding theorem for the anisotropic Nikol'skii classes [see Section 6.1 
below], condition s > 1 implies that the density to be estimated belongs to a class of uniformly 
bounded and continuous functions. Thus, if s > 1 one has N^^(i(/3, L, M) = N.^,;;(/3,L) with some 
M completely determined by L. 

The asymptotic behavior of the L^-risk on class N^^d(/3,L,M) is characterized in the next two 
theorems. 

Let family J-{T-L) be associated with kernel (3.3). Let / denote the estimator given by the selection 
rule (2.9)-(2.10) with x= (||K||oo V lf[{4d + 2)p + 4{d + l)] that is applied to the family Tin). 

Theorem 2. For any M > 0, Lq > 0, £ G N*, any ,00]'^, any L satisfying 

minj=i^. Lj > Lq, and any p G (l,oo) one has 

lim sup I fin ( ) ~' [/ ■,^rA^,L,M)]] <C <oo. 

Here constant C does not depend on L in the cases p < s(2 + 1//3) and p > s{2 + 1 / /3) , s<l. 
Remark 1. 

1. Condition minj^i d Lj > Lq ensures independence of the constant C on L in the cases 
p < s{2 + l/P) and p > s{2 + 1//3), s < 1. If p > s(2 + 1//3), s > 1 then C depends on L, 
and the corresponding expressions can be easily extracted from the proof of the theorem. We 
note that in this case the map L 1-^ C{L) is bounded on each closed cube of (0, 00)*^. 

2. We consider the case 1 < p < 00 only, not including p = 1 and p = 00. It is well-known, 
Ibragimov and Khasminskii (1981), that smoothness alone is not sufficient in order to guar- 
antee consistency of density estimators in Li(M'^); see also Theorem 3 for a lower bound. The 
case p = 00 was considered recently in Lepski (2012). 

3. As it was discussed above, Theorem 2 requires uniform boundedness of the estimated den- 
sity, i.e. II /II 00 ^ ^ < 00. We note however that our estimator / is fully adaptive, i.e., its 
construction does not use any information on the parameters /?, r, L and M. 
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Now we present lower bounds on the minimax risk. Define 




Inn, p > s(2 + 1//3), s > 1, 
1, otherwise. 



Theorem 3. Let j3 € (0,oo)'^, f G [1,00]"', L G (0,00)'' and M > be fixed. 

(i) There exists c > such that 

limmf\(^Y"mfn^^[f; N,-,rf(/3, L, M)] | > c, VpG [l,oo), 

where the infimum is taken over all possible estimators f. If uiiiij^i^^^^ dLj > Lq > then in the 
cases p < s(2 + 1//3) or p > s{2 + 1/ jS) and s < 1 the constant c is independent of L. 

(ii) Let p = 00 and s < 1; then there is no a consistent estimator, i.e., for some c > 

hm inf inf sup 1 1 / — / 1 1 > c. 

/ /eN.^,d(/3,L,A/) 

Remark 2. 

1. Inspection of the proof shows that if maxj=i_,,,^rf Lj < Lqo < 00 then the statement (i) is vahd 
with constant c depending on /3,r, Lq, Lqo, d and M only. 

2. As it was mentioned above, adaptive minimax density estimation on under Lqo-Ioss was 
a subject of the recent paper Lepski (2012). A minimax adaptive estimator is constructed in 
this paper under assumption s > 1. Thus, statement (ii) of Theorem 3 finalizes the research 
on adaptive density estimation in the supremum norm. 

3.4- Discussion 

The results of Theorem 2 together with the matching lower bounds of Theorem 3 provide complete 
classification of minimax rates of convergence in the problem of density estimation on W^. In 
particular, we discover four different zones with respect to the minimax rates of convergence. 

• Tail zone corresponds to "small" p, 1 < p < '^i+if's ' This zone does not appear if density 
/ is assumed to be compactly supported, or some tail dominance condition is imposed, see 
Section 4. 

• Dense zone is characterized by the "intermediate" range of p, '^^^^^ < p < s{2 + 1//3). Here 

the "usual" rate of convergence n~^^^'^^'^'^'^ holds. 

• Sparse zone corresponds to "large" p, p > s(2 + 1//3). As Theorems 2 and 3 show, this zone, 
in its turn, is subdivided into two regions with s > 1 and s < 1. This phenomenon was not 
observed in the existing literature even for settings with compactly supported densities. For 
other statistical models (regression, white Gaussian noise etc) this result is also new. 

It is important to emphasize that existence of these zones is not related to the multivariate nature 
of the problem or to the anistropic smoothness of the estimated density. In fact, these results hold 
already for the one-dimensional case, and this, to a limited degree, was observed in the previous 
works. In the subsequent remarks we discuss relationships between our results and the existing 
results in the literature, and comment on some open problems. 
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1. In Donoho et al. (1996), Delyon and Juditsky (1996) and Kerkyacharian et al. (2008) the 
sparse zone is defined asp > 2(l + l//3), s > 1. Recall that condition s > 1 implies that the 
density to be estimated belongs to a class of uniformly bounded and continuous functions. In the 
sparse zone we consider also the case s < 1, but density / is assumed to be uniformly bounded. It 
turns out that in this zone the rate corresponding to the index v = s/p emerges. 

2. The one-dimensional setting was considered in Juditsky and Lambert-Lacroix (2004) and 
Reynaud-Bouret et al. (2011). The setting of Juditsky and Lambert-Lacroix (2004) corresponds 
to s = oo, while Reynaud-Bouret et al. (2011) deal with the case of p = 2 and /3 > 1/r — 1/2. Both 
settings rule out the sparse zone. The rates of convergence in the tail and dense zones obtained in 
the aforementioned papers are easily recovered from our results. 

3. In the context of the Gaussian white noise model on a compact interval Kerkyacharian et al. 
(2001) developed an adaptive estimator that achieves the rate of convergence (lnn/n)/^/(2/3+i) on 
the anisotropic Nikol'skii classes under condition YM=i['^iT' ~ 1)1+ < ^- This restriction determines 
a part of the dense zone, and our Theorem 2 improves on this result. In fact, our estimator achieves 
the rate (lnn/n)'^/(2/3+i) in the zone XliLi ;h-(^ - 1) < 2 which is equivalent to p < s(2 + 1//3). 

4. It follows from Theorem 3 that the upper bound of Theorem 2 is sharp in the zone p > 
s(2 + l//3), s > 1, and it is nearly sharp up to a logarithmic factor in all other zones. This extra 
logarithmic factor is a consequence of the fact that we use the pointwise selection procedure (2.9)- 
(2.10). We also have extra Inn-term on the boundaries p = j^^/^, p = s{2 + 1//3). 

Conjecture 1. The rates found in Theorem 3 are optimal. 

Thus, if our conjecture is true, the construction of an estimator achieving the rates of Theorem 3 
in the tail and dense zones remains an open problem. 

5. Theorem 2 is proved under assumption r G (1, 00]"^, i.e., we do not include the case where rj = 1 
for some j = 1, . . . , d. This is related to the construction of our selection rule, and to the necessity 
to bound Lr^.-norm, j = 1, . . . , d of the term Bh{f, x); see (2.12) and (2.13). In our derivations for 
this purpose we use properties of the strong maximal operator [for details see Section 6.1], and it 
is well-known that this operator is not of the weak (1, l)-type in dimensions d > 2. Nevertheless, 
using inequality (6.5) we were able to obtain the following result. 

Corollary 1. Let r be such that rj = 1 for some j = 1, . . . ,d. Then the result of Theorem 2 remains 
valid if the normalizing factor (n~^lnn) is replaced by (n~^[lnn]'^) . 

The proof of Corollary 1 coincides with the proof of Theorem 2 with the only difference that bounds 
in the proof of Proposition 1 should use (6.5) instead of the Chebyshev inequality. This will result 
in an extra (lnn)'^~^-factor. We note that the results of Theorem 2 and Corollary 1 coincide if 
d = 1. It is not surprising because in the dimension d = 1 the strong maximal operator is the 
Hardy-Littlewood maximal function which is of the weak (l,l)-type. 

4. Tail dominance condition 

Let (7 : M'^ — )■ be a locally integrable function. Define the map g g* hy the formula 

g*{x) := sup -1 / git)dt, x G (4.1) 
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where Ilh{x) = [xi — hi/2,xi + hi/2] x • • • x [x^ — hd/2,Xd + hd/2]. In fact, formula (4.1) defines 
the maximal operator associated with the differential basis L)^^^d{Ilh{x),h G (0,2]}, see Guzman 
(1975). 

Consider the the following set of functions: for any 9 G (0, 1) and R G (0, oo) let 



Gg{R) = {g:R'^^R: \\g*\\e<R}. (4.2) 



Note that, although we keep the previous notation \\g\\0 = {J \g{x)\^dx)^^^ , \\ ■ We is not longer a 
norm as G (0, 1). 

The assumption that / G Ge(i?) for some 9 G (0, 1) and R > imposes restrictions on the tail of 
the density /. In particular, the set of densities, uniformly bounded and compactly supported on 
a cube of W^, is embedded in the set Ge{-) for any 9 G (0, 1) (for details, see Section 7.4). We will 
refer to the assumption / G Gg{R) as the tail dominance condition. 

In this section we study the problem of adaptive density estimation under the tail dominance 
condition. We show that under this condition the minimax rate of convergence can be essentially 
improved in the tail zone. In particular, if ^ < ^* for some 9* < 1 given below then the tail zone 
disappears. 

For any 9 G (0, 1) let 

1-9/p 1 



u*{9) = max 



l-9/s + l/(3' 2 + 1//3 



u*i9), p<si2 + l//3), (Inn)i/P, p e {_H+iZ^, ,(2 + 1//3)}, 



and define 

u{9) = ( ^ " ^ - ;^ f^nie) = \ - - ' ^ ^ ' (4.3) 

[ u, p> s{2 + l/P), [ 1, otherwise, 

where i' is defined in (3.4). 
Theorem 4. The following statements hold. 

(i) For any 9 G (0, 1) and R > 0, Theorem 2 remains valid if one replaces N^^(i(/3, L, M) by 
Gg{R) n N^,;;(/3, L, M), u by v{9) and fin by fJ,n{(^). The constant C may depend on 9 and R. 

(ii) For any 9 G (0,1), f3,L e {0,oo)'^, f G [l,C)o]'^ and M > one can find R > such that 
Theorem 3 remains valid if one replaces Nf^diP, L, M) by Ge^R) n Nr'^d(/3, L, M), u by ^{9), 
and fin by Pn{0). 

Remark 3. 

1. The tail dominance condition leads to improvement of the rates of convergence in the whole 
tail zone. In particular, under this condition the faster convergence rate of the dense zone is 
achieved over a wider range of values of p, ^'jo^xj^ < P < s(2 + 1//3). Moreover, if 



< r 



s(2 + l//3)-p' 



2+1//9 



then the tail zone disappears. Note that 9* G (0, 1) whenever p < i_^_i/g 
We would like to emphasize that the couple {9, R) is not used in the construction of the esti- 
mation procedure; thus, our estimator is adaptive with respect to {9, R) as well. In particular, 
if the tail dominance condition does not hold, our estimator achieves the rate of Theorem 2. 
On the other hand, if this assumption holds, the rate of convergence is improved automatically 
in the tail zone. 
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3. The second statement of the theorem is proved under assumption that R is large enough. The 
fact that R cannot be chosen arbitrary small is not technical; it is related to the dependence 
between parameters /3, L, f, Mfi and R. In particular, one can easily provide lower bounds on 
R in terms of the other parameters of the class. For instance, by the Lebesgue differentiation 
theorem, f{x) < f*{x) almost everywhere; therefore for any density / G Gg{R) such that 
ll/lloo < M one has 

1 = J f <M^-^\\f*\fg<M^-^R^ R>M^~^I^. 

Another lower bound on R in terms of L, r and 6 can be established using the Littlewood 
interpolation inequality [see, e.g., (Garling 2007, Section 5.5)]. Let < (70 < 9i and a € (0, 1) 
be arbitrary numbers; then the Littlewood inequality states that \\g\\q < ll^llgi ' where 

q is defined by relation ^ = + ^. Now, suppose that / G Gg{R) n N^^rf(/3, L), and choose 
go = 0, qi = and a = jr^^; then q = 1 and 

1 = ll/lli < R'^L-^'-" , i = l,...,d R> max L/'"' . 

i=l,...,d 



Now we argue that condition / G Gg*{R) is, in a sense, the weakest possible condition that 
ensures the "usual" rate of convergence, corresponding to index z/ = /?/(2/3 + 1), in the whole zone 
P < s(2 + 1//3). Let 

= inf7^(")[/; Ge(ii)nN,^,rf(/3,L,M)], 
(9) = inf 7^('^) [/; N,-,d(/3, L, M) \ Gg (R)] 

denote the minimax rates of convergence on classes G6)(-R)nN^^rf(/3, L, M) and N^^d(/3, L, M)\Gg{R) 
respectively. Then Theorem 4 implies that 

c{Lp/n)^^ < ^^{9) < C{Lphin/n)^\ V0 < 0*, Vi? > 0, 
c(L^/n)5^ < V'nl^*) < C{\nnf/^{Lp\nn/n)^\ Vi? > 0. 

On the other hand, if p < '^'^^VP then 

c{Lp/n) < ^^(0) < C[Lp ln?i/?i)i-i/»+i//3. veG(0,l), Vii > 0. (4.4) 

The upper bound in (4.4) is one of the statements of Theorem 2, while the lower bound follows 
from the fact that the worst-case functions, on which the lower bound of Theorem 3 in the tail 
zone is attained, do not belong to any class Gg{R); for details see Section 7.5. 

Thus if we consider the family of functional classes {Gg{R) D Nf^^dW: L, M)}g^fi we can assert 
that the "usual" convergence rate corresponding to the index 1/ = l3/{2f3 + 1) holds in the whole 
zone p < s(2 + 1//?) if and only if 9 < 9*. Note the obvious inclusion 

Gg{R) n {g : llglloo < M} C Gg>{R') n {g : H^IU < M}, < 9', 
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where R' = M^' 1^ ^R. This fact together with Theorem 4 imphes that there is no the tail zone in the 
problem of estimating density / on the class G0* {R) H Nff^dif^: -^)- On the other hand, the lower 
bound in (4.4) implies that the tail zone exists while estimating / on the class N^^(i(/3, L, M)\G0{R) 
for any 6 £ (0, 1). In this sense / G Gg*{R) n N^(/3, L, M) is the necessary and sufficient condition 
eliminating the tail zone. 

As it was mentioned above, the set of uniformly bounded and compactly supported on a cube 
of densities is embedded in the set Gg{-) for any £ (0,1). This fact explains why the tail 
zone does not appear in problems of estimating compactly supported densities. Another interesting 
observation is related to the specific case p = 1. Recall that the condition / S N^^d(/3, L,M) alone 
is not sufficient for existence of consistent estimators. However, for any 6 £ (0, 1) we can show 

, i-e 

ininr'lf; Gg(R)nnr^diP,L,M)] < C ^ ^0, n^oo. 

f ' L ^ J 

This result follows from the proof of Theorem 4 and from (6.5). 
5. Proof of Theorem 1 

First we state two auxiliary results, Lemmas 1 and 2, and then turn to the proof of the theorem. 
Proof of measurability of our estimator and proofs of Lemmas 1 and 2 are given in Appendix A. 

5.1. Auxiliary lemmas 

For any : M'^ — )• denote 



- M:Ahig,x)lnn xlnn 

Mh[g, x) = \ + — — . 

V nVh nVh 

Lemma 1. Let Xh{g,x) = [\Ah{g,x) - Ah{g,x)\ - Mh{g,x)]^, h G Ti; then 

[Mh{g, x) - 5Mh{g, x)]+ < ^Xh{g, x), [Mh{g, x) - 4Mh{g, x)]+ < 2xh{g, x). 

The next lemma establishes moment bounds on the following four random variables: 

Ci{x) = snp[Mx)\-Mh{K,x)]+; 
hen 

C2{x) = sup [\Ch,ri{x)\ - Mh^yri{Q,x)] + ; 

Ux) := snp[\Ah{K,x) - Ah{K,x)\ - Mh{K,x)] + ; ^ ' ^ 

hen 

aix) := sup[\Ah{Q,x) - Ah{Q,x)\ - Mh{Q,x)]+. 
hen 

Denote koo = ||-ft^||oo V 1 and 

F{x) = j l[_i^i]d(t-:E)/(i)dt. 

Lemma 2. Let q > 1, I > I be arbitrary numbers. If x > k^[(2g + 4)(f + 21] then for all x 

IE/[0(x)]« < Co?i-^/'{F(x) Vn-'}, j = 1,2,3,4, (5.2) 

where constant Cq depends on d, q, and koo only. 
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5.2. Proof of oracle inequality (2.13) 

We recall the standard error decomposition of the kernel estimator: for any h G T-L one has 

\hix)-f{x)\ < \Bh{f,x)\ + \Ux)\, 

where Bh{f,x) and (,hix) are given in (2.11) and (2.4) respectively. Similar error decomposition 
holds for auxiliary estimators fh,r){x)] the corresponding bias and stochastic error are denoted by 
Bh,r){f,x) and ih,'q{x). 

1^ . The following relation for the bias Bh.rjifjx) of fh,r){x) holds: 



BhM^ ^) - Bvif, x) = j K^it - x)Bhif, t)dt, V/i, rjen. 
Indeed, using the Fubini theorem and the fact that J Kh{x)dx = 1 for all h £T-L we have 
J [Kh * K^]{t - x)f{t)dt = Khit - y)K^{y - x)dy f{t)dt 

Kr,[y - x)f{y)dy 



(5.3) 



+ 



Krjiy - x) 



Kf,it-y)[fit)-fiy)]dt 



dy. 



It remains to note that / Kh{t — y)[f{t) — f{y)]dt = Bh{f,y) and to subtract f{x) from the both 
sides of the above equality. Thus, (5.3) is proved. 



2". By the triangle inequality we have for any h ^T-L 

\h{x)-f{x)\ < I4(X)-4,(X)| + |4,(X)-A(X)| + |A(X)-/(X)|. 

We bound each term on the right hand side separately. 
First we note that, by (5.3) and (2.12), for any h ^ T-L 



(5.4) 



Rh{x) — sup Mri{Q,x) — Mh{K,x) = sup 

ri>h rieH 



<Bh{f,x) + sup \^h,^{x) - Cnix)\ - Mhvrj{Q,x) - M^{K,x) 



Thus, for any h & Ti 



Rh{x) < Bh{f,x) + 2C{x)+Mh{K,x)+ sup M^{Q,x) 

ri>h 



(5.5) 



where we put 



C(x) := sup [\ChA^)\- Mhvr^{Q,x)] V sup [\Ch{x)\ - MhiK , x)] 
Second, by (5.3) and fh^n = fri,h for any h,r] £ 7i we have 

\fh,r,{x) - fh{x)\ < \Br,^h{f,x)-Bh{f,x)\ + \^h,r,{x)-Ux)\ 

< Br,{f, x) + [\^h,v{^) - 4(x)| - Mh^^^iQ, x) - Mh{K, x)] + sup M^(Q, x) + Mh{K, x) 

r]>h 



< B.,{f,x) + 2ax) + Rh{x) 
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where the last inequahty holds by definition of Rh{x) [see (2.9)]. There inequalities imply the 
following upper bound on the first term on the right hand side of (5.4): for any h E H 

\fh,hi^) - fd^)\ < BhU.x) + R~^{x) + 2C{x) 

< Bh{f,x) + Rh{x) + 2C{x) 

< 2Bh{f,x)+4C{x)+ sup M^{Q,x) + Mh{K,xy, (5.6) 

ri>h 

where we have used the fact that Rp^^ix) < Rh{x) for all h £ T-L, and inequality (5.5). 

Now we turn to bounding the second term on the right hand side of (5.4). We get for any h £ Ti 

I4h(^)-A(^)l = lA,^(x)-A(x)|±[M^^,(Q,x) + M,(K,x)] 

< Rf^{x)+ sup Mr,{Q,x) + Mh{K,x) 

r]>h 

< Bh{f,x) + 2C{x) + 2supM^{Q,x) + 2Mh{K,x), (5.7) 

where we again used (5.5) and the fact that Rf^{x) < Rh{x) for all h £ H. 
Finally for any h £% 

\h{x)-f{x)\ < \Bh{f,x)\ + Mx)\<Bh{f,x) + Mh{K,x) + C{x). 

Thus, combining (5.6), (5.7) and (5.4) we obtain 

I4(x)-/(x)| < inf \ABh{f,x) + 3snpMr,{Q,x) + 3Mh{K,x) + Mh{K,x)}+6C{x) + C{x). (5.8) 
hen ^ ri>h J 

3''. In order to complete the proof we note that by the first inequality of Lemma 1 for any 

Mh{g,x) < 20Mh{g,x) + 2xh{9,x). 
In addition, by the second inequality in Lemma 1 

Mx)\ - Mh{K,x) = Mx)\ - Mh{K,x) + Mh{K,x) - Mh{K,x) < C(x) + 2x(x), 
\Ch,r,{x)\- Mhvn{Q,x) = \Ch,r,{x)\ - Mhvn{Q,x) + MhvniQ,x) - MhvviQ,x) < C{x) + 2x{x), 

so that C(x) < C{x) + 2x{x). Substituting these bounds in (5.8) we obtain 

\fix)-f{x)\ < inf \ABh{f,x) + GOsnp Mr,iQ,x) + QlMhiK,x)} + 7C{x) + ISxix), 
hen I ri>h J 

as claimed. ■ 



5.3. Proof of moment bounds (2.16) 

Let C,j{x), j = 1, . . . , 4 be defined by (5.1). Then 

E/[0(x)]« < Cm-''\F{x)yn~'], 
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as claimed in Lemma 2. 

Let Ti = {x eR'^ : F{x) > n"'} and r2 = M'' \ Ti. Therefore 

/ Ef[Cjix)]'^dx < Con~'^^'^ f F{x)dx< I F{x)dx = 2'^Con-'il'^. (5.9) 

Now we analyze integrability on the set T2. We consider only the case j = 1,2 since computations 
for j = 3, 4 are the same as for j = 1. 

Let C/max(a;) = [x — l,x + 1]*^ and define the event D{x) = { XlILi '^i-^i ^ Umaxix)] < 2}, and let 
D{x) denote the complementary event. First we argue that for j = 1,2 

Cj{x)l{D{x)} = 0, Vx G T2. (5.10) 

Indeed, if x £ T2 then for any h 

\EfKh{Xi -x)\< n\ocF{x) < k^n''-', \EfQh{Xi - x)\ < n\l,F{x) < k^n'^"'. 

Here we have used that V. = [l/n, 1]'^ and that supp(i^) = [-1/2, 1/2]^^, supp((3) = [-1, 1]^. 
Hence, by definition of ^/i(x), for any h & H one has for any I > d + 1 



I 1 " 

MxmD{x)} < \-Y,KhiX,-x)l{D{x)}+k^n''-' 



1=1 



nVh nVh 

where we have used that n'^~^ < (nVh)^^ for I > d+1, xlnn > 4koo by the condition on x [see also 
definition of Mfi{K, x)], and n > 3. Therefore Ci{x)l{D{x)} = for x G T2. By the same reasoning 
for C2{x) we obtain that C2ix)'i-{D{x)} = 0, Vx G T2 because xlnn > 4k^. Thus (5.10) is proved. 
Using (5.10) we can write 

/ Ef[Cj{x)yi{D{x)}dx < [ E/([sup|a(x)rv sup \ChA^)\'']l{D{x)})dx 

< {2klny [ Ff{Dix)}dx. (5.11) 

Now we bound from above the integral on the right hand side of the last display formula. For any 
z > we have in view of the exponential Markov inequality 

n 

ff{D{x)} = P/{j^l[XiGC/^ax(x)]>2}<e-2^[e^F(x) + l-F(x)]" 

i=l 

= [(e^ - l)F{x) + 1]" < exp{-2z + n(e^ - l)F{x)}. 

Minimizing the right hand side w.r.t. z we find z = ln2 — ln{nF(x)} and, therefore, 

¥f{D{x)} < A-^n'^F^{x)exp{2-nF{x)} < {e^ / A)r? F"^ {x) . 
Since F{x) < n~' for any 2; G T2 we obtain 



Ff{D{x)}dx < (eV4)n2-' j F{x)dx = 2'^{e^/4) 



T2 
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Combining this inequality with (5.11) we obtain 

/ EfUx)Yl{D{x)}dx < 2^(2kL)''(eV4)n2+'^^-'. (5.12) 
Choosing I = [d+ l)q + 2 we come to the assertion of the theorem in view of (5.9) and (5.12). | 



6. Proofs of Theorem 2 and statement (i) of Theorem 4 

The proofs of Theorem 2 and of statement (i) of Theorem 4 go along similar lines. That is why we 
state our auxiliary results (Propositions 1 and 2) in the form that is suitable for the use in the proof 
of Theorem 4. For this purpose it will be convenient to extend the definition of the class Gg{R) [see 
(4.2)] to the case = 1. In the sequel by Gi{R) we mean the set of all probability densities on M'^, 
no matter what the value of i? is. 

This section is organized as follows. First, in Subsection 6.1 we present and discuss some facts 
from functional analysis. Then in Lemma 3 of Subsection 6.2 we state an auxiliary result on ap- 
proximation properties of the kernel K defined in (3.3). Proof outline and notation are discussed 
in Subsection 6.3. Subsection 6.4 presents two auxiliary propositions, and the proofs of Theorem 2 
and statement (i) of Theorem 4 are completed in Subsections 6.5 and 6.6. Proofs of the auxiliary 
results, Lemma 3 and Propositions 1 and 2 are given in Appendix B. 

In the subsequent proof q, Q, Cj, Q, Cj, Cj, Cj, Cj, . . ., stand for constants that can depend on Lq, 
M 13, f, d and p, but are independent of L and n. These constants can be different on different 
appearances. In the case when the assumption / € G0{R) with 9 G (0, 1) is imposed, they may also 
depend on 9 and R. 

6. 1 . Preliminaries 

We present an embedding theorem for the anisotropic Nikol'skii classes and discuss some properties 
of the strong maximal operator. 

6.1.1. Embedding theorem 

The statement given below in (6.2) is a particular case of the embedding theorem for anisotropic 
Nikol'skii classes N^,d(/3,L); see (Nikol'skii 1977, Section 6.9.1.). 
For the fixed class parameters (3 and r define 



and put 




Let t{p) > and Tj > for all i = 1, . . . , d; then for any p > 1 one has 

N^,rf(/3,L) CN,-,rf(7,cL), (6.2) 
where constant c > is independent of L and p. 
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6.1.2. Strong maximal function 

Let g : — >• M be a locally integrable function. We define the strong maximal function g* of g by 
formula 

g'ix) ■.= sup-^ [ g{t)dt, xGM^ (6.3) 

H 1^1 Jh 

where the supremum is taken over all possible rectangles H in with sides parallel to the coor- 
dinate axes, containing point x. It is worth noting that the Hardy-Littlewood maximal function is 
defined by (6.3) with the supremum taken over all cubes with sides parallel to the coordinate axes, 
centered at x. 

It is well known that the strong maximal operator g ^ g* is of the strong (p,p)~type for all 
1 < p < oo, i.e., if 5 € Lp(R'^) then g* G Lp(]R'^) and there exists a constant C depending on p only 
such that 

\\9*\\p<C\\g\\p, pe(l,oo]. 
Let g* be defined defined in (4.1). Since obviously g*{x) < g*{x) for all x G we have 

\\9*\\p<C\\g\\p, pG(l,oo]. (6.4) 

In distinction to the Hardy-Littlewood maximal function, the strong maximal operator is not of 
the weak (l,l)-type. In fact, the following statement holds: there exists constant C depending on 
d only such that 

|{x:g^(x)>a}|<Cy"^^|l+(^ln+^^)'' 'jdx, Va > 0. (6.5) 
We refer to Guzman (1975) for more details. 
6.2. Approximation properties of kernel K 

The next lemma establishes an upper bound on norm of the bias B^if, •) of kernel estimator f^ 
when / belongs to the anisotropic Nikol'skii class. 

Lemma 3. Let f G N^^rf(/3,Z). Let f^ be the estimator (2.1) associated with kernel (3.3) with 
I > maxj=i^,,, Then Bh{f,x) can represented as the sum Bh{f,x) = Yl'j=i -^hjif, with 
functions Bhj{f,x) satisfying the following inequalities: 

\\BhAf^ OIL, < CiLjhf, Vj = 1, . . . ,d. (6.6) 
Moreover, if s > 1, then for any p > 1 

\\BhAf^ < C2L,h]' , Vj = 1, . . . , d, (6.7) 

where 7 = ^{p) and q = q{p) are defined in (6.1). Here Ci and C2 are constants independent of L 
and p. 
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6.3. Proof outline and notation 

The starting point of our proof is the pointwise oracle inequality (2.13) together with the moment 
bound (2.16). Denote 

Uf{x)= inf {Bh{f,x) + su^M^{KyQ,x)\; (6.8) 

then, taking into account that Mrj{K \/ Q,x) is greater than Mt^{K,x) and Mr^{Q,x) for any x and 
r] [see (2.5) and (2.8)], and using (2.13), we have 

\f{x)-f{x)\<co[Uf{x)+uj{x)], 

where cq is an absolute constant, and uj{x) := C(x) +x(^) with <^(x) and x{x) defined in (2.14) and 
(2.15). Therefore, by (2.16) applied with q = p and by the Fubini theorem, there exists constant 
Co > such that for any probability density / and any Borel set ^ C R'^ one has 



% / \f{x)- f{x)\Pdx < CO 

J A 



U^f ix)dx + 



A ^ 



(6.9) 



Recall that koo = ||-f^||oo V 1; by definition of Bh{f, x) [see (2.12)] and by Lemma 3 one has 

d 

Bh{f,x)<k^Y.KAf^^)^ 
where BJ^j{f,x) is the strong maximal function of \Bhj{f,x)\, j = 1, . . . ,d. Therefore if we let 

Uf{x) := ini \ max 5f •(/, x) + sup Af„(K V Q, x)|, (6.10) 

hen I j=l,...,d ) 

then 

Uf{x) < k^Uf{x), Vx G R'^. (6.11) 
The key element of the proof is derivation of upper bounds on the integral 



J ■■= [ C/?(x)dx. 



These bounds will be established by division of M'^ in "slices" , and appropriate choice of bandwidth 
h £ 7i on every "slice". For this purpose the following bounds on norms of j(/, •) will be used. 
Inequality (6.4) and the first assertion of Lemma 3 imply that for any p > 1, f G (1, oo]'^ and any 
/ G Nf?^d(/3', L) one has 

\\Bl,{f, OIL, < ciLjhf, yj = l,...,d, (6.12) 

Moreover, if s > 1 then, by the second assertion of Lemma 3, for any p > 1, f G (l,oo]'^ and 
/GN,-,4/3,L) 

\\Blj{f,-)\\^^<C2L,h]^, Vj = l,...,d. (6.13) 
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Let 6 := Inn/n, ip := {Lp6f/^^^+^\ Let mo{0), 9 G (0,1], be an integer number to be specified 
later; see (6.19) below. For m G Z, m > mQ{6) define "slices" 

:= {x G : 2"V < Uf{x) < 2"^+V}, ^-^^g) := {x £ R"^ : Uf{x) < 2'"oW(^}, 

and consider the corresponding integrals 

Jni ■= / U^{x)dx, / _ U^{x)dx. 

With this notation, using (6.9) and (6.11) we can write 

< I \f{x)-f{x)fdx + ci J2 ^/(^)d^ + ^2^""^' 



m.Q(0) ni=m(}(9) ' 

oo 

=■■ J^oie) + ^1 E '^'^ + ^2^-^/^. (6.14) 

m='mo{6) 

The rest of the proof consists of bounding the integrals >/^g(g) and on the right hand side of 
(6.14) and combining these bounds in different zones. 

The following notation will be used in the subsequent proof. For the sake of brevity we will write 

Mr^{x) := Mr,{KV Q,x), An{x) := An{K V Q,x), ^rjen. 

We let / := {1,... ,4, and 

1+ := {j e I : p < rj < oo}, /_ := {j G 7 : 1 < rj < p}, loo := {j ^ I ■ rj = oo}. 

With 7 = (71, . . . ,7rf) and q = {qi, . . . ,qd) given by (6.1) we define quantities 7, v and L-y by the 
formulas 

^ d ^ d d 

-:=E-' -^=E^' L,:=\[l]'^^. (6.15) 

Note some useful inequalities between the quantities defined above. First, 7^ < f3j for all j G /_ 
which is a consequence of the fact that r(p) < Tj for j ^ I^. This implies 

--^=E(--|)>0- (6-16) 

Next, if s > 1 then 

->-. (6.17) 

s u 

We have 

--- = y(- ^) = E-f--^y 

Hence (6.17) will be proved if we show that rJ^T(p)p > tj for all j £ I-. Indeed, 



rip)p _ Pil - 1/s) + 1//3 ^ , 1 ^ ^ 

s (3rj 
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>l-- + —-=rj, 



where to get the second inequahty we have used that rj < p for any j G I_ and that s > 1. 
Finally, remark also that 

p-v{2 + l/j) < 0. 



.18) 



Indeed, since rj > p for any j G U /qo. 
This yields p < v/^, and (6.18) follows. 



P 

V 



6.4. Auxiliary results 

For 6* G (0, 1] and for some constant ci > define 

mo(0) := min < m G Z : 2 



(6.19) 



Note that I - 6/s + 1//3 > for any 9 G (0, 1], since s > /3 by > 1, j = 1, . . . , d. Therefore 
mo{9) < for large enough n. 

It will be convenient to introduce the following notation 

mi := min {m G Z : 2™['"(2+i/7)-«(2+i//3)] > (L^/L^)'"^-(i//5-i/7) |. (6.20) 
It follows from this definition that 

(L,/L,)V^^/^"^/^ 



i>{2+l/7)-i,(2+l/,9) mi 



< 2™i < 2 



u(2+l/7)-s(2+l//3) 



In view of (6.16) and (6.17) 



yi2 + ^)-s{2 + ^^=sv 



Iwl 1\ 1/1 1 



>0; 



.21) 



(6.22) 



hence mi > 1 for large n. 

The bounds on '^^^(e) and Jm are given in the next two propositions. 

Proposition 1. There exist constants ci,C2 > and Ci,C2 > such that any n large enough the 
following statements hold. 

(i) Let f G G0{R), 9 G (0, 1]; then for any mo(^) < jtt, < one has 

Jm < Ci2'"V^' 175+171; <^P. 

(ii) For any m G Z satisfying 1 < 2™ < C2ip~^ one has 

< (722™[P-^(2+l//3)]^_ 

(iii) Let s > 1; then for any m G Z such that m > mi and 2"^ < C2i^~^ one has 

-L.^^IP- 



Jm < C2^ 



Lfi^^h 
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2m[p-i.(2+l/7)] 



(6.23) 
(6.24) 

(6.25) 



Proposition 2. There exist constants C^^C^ > such that the following statements hold. 
(i) Let V is defined in (3.4)- Then for all large enough n and for any density f one has 



J-,^i) = \f{x)-f{x)\^dx < C,{Lp6r. (6.26) 

(ii) Let y{6) is defined in (4-3). Then for any 9 € (0, 1) and for all n large enough 

sup / \f{x)-f{x)\''dx<C^{Lp6r^'\ (6.27) 



6.5. Proof of Theorem 2 

Using (6.14) and inequality (6.26) of Proposition 2 we obtain 



oo 



¥.f\\f-f\\P<ci{Lp5y + C2 Yl Jrn. (6.28) 

m=mo (1) 

We proceed with bounding the second term on the right hand side of the last display formula. 
First, because ||/||oo < M, 

max \\Bljif,-)\\oo < 2''Mkl, supP^IU < 2'^MkL. 

]=^,--;<i ' ri>0 

This implies that there exists constant C3 > with the following property: 

7712 := min{?7i G Z : 2™ > c^if^^} =^ Jm = 0, Vm > 7712. 
Thus the sum on right hand side of (6.28) extends from 777o(l) to 7772. 

1*^. Tail zone: p < ^qr^Tj- Using bounds (6.23) with 9 = 1 and (6.24) of Proposition 1, we obtain 



° 2+1/0. 



Jm < C,ipP[ 2"^(^-^) + ^2-[f-^(2+l//3)]J < ^^^P2'"°(^)(^-^\ 

m=mo(l) m=mo(l) m=l 

where the last inequality follows from the fact that 777o(l) < and p < '^^^y^ < s{2 + 1//3). Using 
(6.19), after straightforward algebra we obtain that 

°° -1 

m=mo(l) 

2^. Dense zone: < p < s(2 + Because p > \'^/fg , by Proposition 1, inequality (6.23) 
with 9 = 1, 

Y Jm < cr^" Y 2"^""^^ < C8(/3P = C8(L/3(5)WT. (6.29) 

m=mo(l) m=mo(l) 
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Furthermore, because p < s{2 + 4) we have by Proposition 1, inequahty (6.24), that 



m2 



m,2 



m=l 



Thus, in the dense zone 



m2 



m=m,o (1) 

3*^. Sparse zone: p > s{2 + s < 1. First we note that the bound in (6.29) remains true since 
p > s(2 + 1//3). By the same reason in view of Proposition 1, inequahty (6.24), 



m2 



.30) 



m=l 



Here we have used the definition of m2. It remains to note that conditions p > s(2 + 1//3), s < 1 
imply that ipPd"^ — )• as n — )• 0. Therefore the statement of the theorem fohows from (6.29) and 
(6.30). 

4". Sparse zone: p > s(2 + 1//3), s > 1. We need to bound only Jm, because (6.29) remains 

true. By inequality (6.24) of Proposition 1 and because p > s(2 + 1//3) 



mi 



p^m-i (p-s{2+i)) 



m=l 



Next, we have in view of the inequality (6.25) of Proposition 1 



m2 



m=mi+l 

Since p - u(2 + I/7) < [see (6.18)], 



1/7 



m=mi+l 



ni2 



m=m\+l 



2mi(p-«[2+l/7]) < ci6(/5P2™i(P~*[2+^/'^l). 



In order to obtain the second inequality we have used (6.21). Thus, 



^ Jm<Ci7</?^2-i[^'-^(2+l//5)]_ 



m=l 



Using equality (6.22) and (6.21) we obtain 



mi 



"Y Jm < C2o(L^/L 



p-a(2 + l//3) 

s(2+l//3)(l/s-l/v)+(l/7-l//3) 



p(l/s-l/^.) + l/7-l//3 
{Lpd) (2+l/;3){l/s-l/i;) + (l/7-l//3)s-l . 



m=l 



The statement of the theorem is now obtained by the following routine computations. Denote 

1_^1 ^ _sr^ ^ 



s- pP- ' 
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First, we remark that 



Next, 



p[ +~~« = + ir = ir^'^P- 



r(p)/3_ r(p)/3Vs- r{p)(3pf3_ 
1 / 1 1 \ ^ 1 ^ 



Hence, I/7 - 1//3 = l/7_ - l/^_ = A/{t{p)i3), which imphes that 



s V s- pj- P\P- 7-/ V Vt{p)P 

Two last equahties yield 



2 + + - 



2 + 4)f^(p)-^)+^ 



A ^ 1 

2 + 7T-^ , 



pJ\s V/ \7 /3/ s LV p/3/ S/3J ''"(p) V /3 s/3 

where the last equality follows from the fact that t{p) — l/{pf3) = 1 — This together with (6.31). 
leads to the statement of the theorem in the sparse zone. 

5". Boundary zones: p = s{2 + -g), p = j^^- Here the proof coincides with the proof for the 
dense zone with the only difference that the corresponding sums equal \mi\ and m2 respectively. 
This results in extra ln(l/5) factor in the final bounds. ■ 



6.6. Proof of statement (i) of Theorem 4 

In view of (6.14) and by bound (6.27) of Proposition 2, 



00 

IE/||/-/||^ < ci{Lp6Y'''''^ + C2 

r?i=mo(0) 



If P < i/e+i^/s then, using bounds (6.23) and (6.24) of Proposition 1, we have 



^ / 2+1//3 \ 2+1/13 \ p-e 

m=mo{9) m=m,o(6) 

and the assertion of the theorem follows. If s(2 + 1//3) > p > i/g^i^/g ^^^^ 



m=mo (9) 
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7. Proofs of Theorem 3, statement (ii) of Theorem 4 and the lower bound in (4.4) 



The proof is organized as follows. First, we formulate two auxiliary statements, Lemmas 4 and 5. 
Second, we present a general construction of a finite set of functions employed in the proof of lower 
bounds. Then we specialize the constructed set of functions in different regimes and derive the 
announced lower bounds. 



7.1. Auxiliary lemmas 

The first statement given in Lemma 4 is a simple consequence of Theorem 2.4 from Tsybakov 
(2009). Let F be a given set of probability densities. 

Lemma 4. Assume that for any sufficiently large integer n one can find a positive real number pn 
and a finite subset of functions {^f^^\ f^^\ j G ^Tn} C F such that 

||/^*^-/^^lp>2p„, V^,jG J-„U{0} : i/i; (7.1) 



l™«^Pr^ E V>l^(^^"^)| =--C<<^. (7.2) 



Then for any q>l 



liminfinf sup p^^ (e J/ - /||^) (./C + VcTl) 

n->oo j \ ■ " "P/ \ / 

where infimum on the left hand side is taken over all possible estimators. 

We will apply Lemma 4 with F = N^,^(/3, L, Af ) in the proof of Theorem 3 and with F = 
Gg{R) n N^^rf(/3, L, M) in the proof of statement (ii) of Theorem 4. 

Next we quote the Varshamov-Gilbert lemma [see, e.g.. Lemma 2.9 in Tsybakov (2009)]. 

Lemma 5 (Varshamov-Gilbert). Let be the Hamming distance on {0,1}"*, m G N*, i.e. 

m 

£»^(a,6) = ^l{a, a, 6 £{0,1}™. 

i=i 

For any m > 8 there exists a subset Vm of {0, 1}™ such that \Vm\ > 2"*/^, and 

Qm{a,a') > — , Va, a' G Vrn- 
o 

7.2. Proof of Theorem 3. General construction of a finite set of functions 

1°. For any t G M set 

A{t) =(^l' e-i/(i-"^)d^^ 'e-i/(i-t^) y(t). 

Note that A is a probability density compactly supported on [—1, 1] and infinitely differentiable on 
the real line, A G C°°(]R^). Obviously, for any a > and r > 1 there exists constant ci = ci(a,r) < 
oo such that 

AG N,,i(a,ci). (7.3) 
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Define 



1=1 



i(y)dy 



X = {xi, . . . ,Xd) G 



where parameter N = N{n) > 8 will be chosen later. By construction, f^^^ is a probability density 
for any choice of A^, supp(/(°)) = [-N/2 - l,N/2 + 1]^, and 



/(o) (x) = N-'^, G [- N/2 + l,N/2 - 1] 



(7.4) 



Moreover, in view of (7.3) and by the Young inequality, there exist constants C = (Ci, . . . , C^) 
depending on /3 and r only such that 



7^°^ GN,-,d(/3,(7). 



(7.5) 



Note that C do not depend on N. 

Let Lq > be fixed, and let f^^\x) = x'^/^'') (xx) , where x > is chosen in such a way that /(°) 
belongs to the class Nj?^(i(/3, 2~^Lo), where Lq = (Lq, . . . , Lq). The existence of such x independent 
of N and determined by (3, f and Lq is guaranteed by (7.5). Note also that f^^^ is a probability 
density. Moreover, we remark that ||/'-'^^||oo ^ A^^'^ since J |A| = 1. Thus, 

/(°) GN,.d(/3,Lo/2,M/2), (7.6) 

provided that > {2M~^y^'^>t:. This condition is assumed to be fulfilled. 

2°. Put for any t G 



9it) 



Hy-t) i[o,i](y) -i[-i,o](y) 



dy. 



We obviously have g £ C 



oo fm 1 > 



and 



(i) / g{y)dy = 0, (ii) supp((?) C [-2,2], 



(iii) llfflloo < 1- 



(7.7) 



For any / = 1, . . . ,d let (20x)~ > ai = oi{n) — )• 0, n — )• oo, be the sequences to be specified 
later. Let Mi = (20xo"/)"^A^, and without loss of generality assume that Mi, I = 1, . . . ,d are integer 
numbers. Define also 



A^ 



4x 



+ 8jai, j = l,...,Mi, 1 = 1,. ..,d, 



and let = {1, . . . , Mi} x • • • x {1, . . . , M^}. For any m = (mi, . . . , m^) G M define 

Xl Xmi,l 



Gm{x) = n 



9 



1=1 



X G 



Llm = [Xmi,l - 3fTl, Xmi.l + 3fTl] X • • • X [x^^.d - 3(Trf, X^^.d - 3(7^] C M"*. 

Several remarks on these definitions are in order. First, in view of (7.7) (ii) 

supp(Gm) C Urn, '^m e M, 

Um n rij = 0, Vm, j £ M : j. 
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(7.8) 
(7.9) 



Second, since g S C°°(M^), we have that Gm G C°°(M'^) for any m ^ M. Moreover, for any 
/ = 1, . . . , d, any \h\ < ai and any integer k 



(7.10) 



where D^G stands for the /cth order derivative of a function G with respect to the variable x/, and 
A/i ^ is the first order difference operator with step size h in direction of the variable x/. 
For m G 7W define 



a— 1 /a \ 

7r(m) = ^(mj - 1) f H + 

j=i ^ «=j+i ^ 



It is easily checked that vr defines enumeration of the set A^, and tt : M — )■ {1,2...,|A^|} is a 
bijection. Let be a subset of {0, Ijl-^L Define a family of functions {Fu,.,w G W} by 



x), X G 



where -Wj, j = 1, . . . , \M\ are the coordinates of w, and ^4 is a parameter to be specified. It follows 
from (7.7)(iii), (7.8) and (7.9) that 



and (7.7) (i) implies that 



\fJ\^<A, Vw;GTy, 



/ F^{x)dx = 0, V-u; G W. 



(7.11) 
(7.12) 



3*^. Now we find conditions which guarantee that Fyj G N^^rf(/3,2 ^L) for any w G W. 
Fix / = 1, . . . , d, and let /c; = [/3/J + 1 if A ^ N*, and A;; = [A J + 2 if /3« G N* (here [xj stands 
for the maximal integer number strictly less than x). 
First, for any w and /i G M 



a''' F 



<l 









(7.13) 



where the last inequality is found in (Nikol'skii 1977, Section 4.4.4). Next, in view of (7.9) and 
(7.10) we obtain for any w £ W and any ri ^ oo 



Ah,i{Df^~'F^)r = V / A,4Df'-'F^){x 



dx 



dx 



< A'^Sw\\g\ 



\{d-l)r, -{ki-l)r, 



h 



9 



{kl-l)( 



where we have put Sw ■= sup^g^y \{j : Wj ^ 0}\. Thus, for any r; 7^ oo we have 

d 



Ah,i{Df^-'F^ 



Sw Yl 



g'^'^-'H ■ --) - 9^''-'H-) ■ (7.14) 

0"/ 
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Similarly, we get for any w £ W 



= sup sup 



^ sup 'u;^(j) sup A/j,/(L»/' Gj){x) 



— 1 1'^ 1 1 OO 1 



(7.15) 



In view of (7.7)(ii) and \h\ < ai, function • -[h/ai]) - g'''''~^\-) is supported on [-3,3]. 

Therefore the fact that g £ C°°{M}) implies for any G [1, oo] 



'''I 



-ki+l 



In the last inequality we have used that < /3; — fc; + 1 < 1 by definition of ki . Combining this with 
(7.13), (7.14) and (7.15) we have for any \h\ < ai and any r/ G [l,oo] 



a'"' F 



\d-l\ 

In 



d \ 1/n 



(7.16) 



If \h\ > ai then we note that Ah,iiD'['-^F^){-) = {d'['-^F^){- - hei) - {d'['-^F^){-), and by the 
triangle inequality 



Ah,i{D'^'~'F^) 



< 2 




< 2 


D^'~^F 











In view of (7.8) and (7.9) we get for any w £ W and any r; / oo 

ri 



/ Df'-'FUx) ' dx = A^' ^-0) / A''~'G,(x) 



dx 



\{d-l)ri 

In 



-r"'"(n 



Moreover, 



sup sup 



F>^^ ^Fyj{x) = A sup sup D^^ ^Gj{x) 

jeM xeUj 



< Ay 



a 



We obtain finally from (7.13) that for any \h\ > ai and any r/ G [l,oo] 



<A\hM\g\r'U'^'-'^ 



ai^'iSwU^A . (7.17) 

^ .7 = 1 ^ 



Combining (7.16) and (7.17) we conclude that for any w £W and ri £ [l,oo] 



A^i F 



/ d \ 1/n 
<CiA\h\^^al^'{Sw\[<yj\ , V/^GM^ 
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where Ci = max^dlffll'^ Vax{6i/'^' H^^'^'^Hoo, 2||c/('='-^)||^,}). Thus, if 



Aa^^'iSwllaA <{2Cir^Li, V/ = l,...,d (7.18) 

V j=i / 

then G N^r,d(/3, 2-'^L) for any weW. 
4P. Define for any w G W 

fUx) = f^°Hx) + FUx), xeR''. 

Remind that f^^^ is the probabihty density belonging to N^?^(i(/3, Lo/2, M/2). Therefore, in view of 
(7.12) and under condition (7.18), for any w £ W 



/ fu,{x)d 



x = l, G N^,rf(/3,L), 



(7.19) 



where the latter inclusion holds because miiij^i^ ^ Lj > Lq. 
By construction of F^, for any w £ W 



FUx) = 0, yxi - (AT _ 4) (AT + 4) 

L 4X 4X 



This yields 

/«,(3;) = /°)(x)>0, Vx 
On the other hand, by (7.4) 

/(0)(a;) = x-^iV-^, VxG 
Therefore, if we require 



J_(Ar_4), J-(Ar + 4)l'. 
4x 4x J 



— (iV-4), — (iV + 4) 
4x 4x- 



d AT— d 



A < K^N 



(7.20) 



(7.21) 



(7.22) 



(7.23) 



this together with (7.13) implies 



/^(x)>0, VxG [--^-(iV-4), -^-(iV + 4)j^ 

We conclude that /^i > for any G VF. Moreover, we get from (7.6), (7.11) and (7.23) that 
11/t.lloo < M for any w£W. 

Ah this, together with (7.19), shows that {f'^^^fw , w G W} is a finite set of probability densities 
from N^^d(/3, L, M). Thus Lemma 4 is apphcable with Jn = W and F = Nr-,d(/3, L, M). 



5". Suppose now that the set W is chosen so that 



(7.24) 
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where, we remind, Q\m\ is the Hamming distance on {0, Ijl-'^L Here B = B{n) > 1 is a parameter 
to be specified. Then we deduce from (7.19), (7.8) and (7.9), that for all w,w' 



\\f..-fn,'\\l = \\F^-F^,\\1 = ApY,W 



> 9 



^ 7 = 1 / 



0') - "'-(J) 



(i) - "'-(i) 



dx 



^.7 = 1 ^ 



(7.25) 



Here we have used that the map vr is a bijection. Putting C2 = \ \\g\\ j we conclude that condition 
(7.1) of Lemma 4 is fulfilled with 



^n = C2A(B\{a^ 



i/p 



(7.26) 



Let us remark that (7.26) remains true if we formally put p = 00. Indeed, similarly to (7.25) 



1 1 fw fw' I 



— 117? _ 7? ,11 



A suu \Wi — wA\\q\\ > A\\q\\ 



(7.27) 



Here we have used (7.9), the fact that the map vr is a bijection and, that w ^ w' for all w,w' 
in view of (7.24). 

Now we verify condition (7.2) of Lemma 4. First observe that 



dP 



dP 



fw I — TT fwjXk) 



f'°' k=l 

Since X^, k = 1, . . . ,n are i.i.d. random vectors, we have for any w gW 

/(0)(x) + 2F^(x)+F2(x) 



/(o)(. 



dx 



1 + 



The last equality follows from (7.12). By (7.20) and (7.22), 



dx 



dx = >i-'^N'^\\Fu,\\l, 



hence for any w GW 



dP 



(XW)| ={ 



1 + x~'^N'^\\F^\\lY < exp^nx~'^N'^\\F^\\l 
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Repeating computations that led to (7.17) we have 



d 

\F4l<A^\\g\\l''SwllcTj. 



The right hand side of the latter inequality does not depend on w; hence we 

W ^ V)(^(^^"^)r^-p{^3nA^^H^A^^(na,) -ln(|iy|)}, 

where we have put C3 = x^'^H^fH^'^. Therefore, if 

d 

then condition (7.2) of Lemma 4 is fulfilled with C = 1. 

In order to apply Lemma 4 it remains to specify the set W and the parameters A, N, aj, 
j = 1, . . . , d so that the relationships (7.18), (7.23), (7.24), and (7.28) are simultaneously fulfilled. 
According to (4), under these conditions the lower bound is given by pn in (7.26). 

7.3. Proof of Theorem 3. Derivation of lower bounds in different zones 

We begin with the construction of the set W . Let m > 8 be an integer number whose choice will 
be made later, and, without loss of generality, assume that |A^|/m is integer. Let Vm be a subset 
of {0, 1}™ such that 

\Vr,^\ > 2"^/^ 0m{z, z') > m/8, Vz, z' S V^,. (7.29) 

Existence of such set Vm is guaranteed by Lemma 5. Let := {1 + j = 0, . . . , m — 1}, and 

note that J' C {1,...,|A^|} with the equality in the case m = \M\. Define the map T : Vm 
{0,l}l-^lby 

flj, j G J, 

0, j G{1,...,\M\}\J, 

and let W = T{Vm)- Obviously, Q\ji4\{'w,w') = ^|_A/(|(T[a], T[a']) = Qm{c^iCi') for all w,w' G W; 
therefore (7.29) implies that 

|P^|>2"'/^ Q\M\{w,w')>8-^m, 'iw,w' eW. (7.30) 

With such a set W, Sw < "i; moreover, since ln(|Ty|) > mln2/8, condition (7.28) holds true if 



A^nN'^ W aj < {Sdy^ In 2. (7.31) 



We also note that condition (7.18) is fulfilled if we require 



Aa^'^'i^mYlaj) <{2Ci)-^Li, V/ = l,...,d. (7.32) 



In addition, (7.24) holds with B = m/8. 
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1.3.1. Tail zone:p< 



Let m = \M\. By construction, \M\ = Uf=iMi = (20x)~'^Af'^ nf=i f^;"^ and, therefore (7.32) is 
reduced to 

Aaf'^'N'^^''' < C5L1. (7.33) 

Thus, choosing 

C7, = CgA^/^'L^'^/^'iV'^, (7.34) 

we guarantee the fulfiUment of (7.33) provided that Cg > max/=i^,,,^rf Cg Moreover, with this 
choice (7.31) is reduced to 

^2+i//3^d(i+i/.) < C7Lpn-\ (7.35) 

where, as before, = Y['i=i ^]^^'^ ■ Moreover, we have from (7.26) 

Pn = CsAN''/P, Cs = C3{160xy'/P. (7.36) 

Let N'^ = CgA"^, where constant Cg < x"^ will be specified below; then (7.23) holds. Next, in 
view of (7.35) and (7.36) 

A = Ci^{Lp/ny-y^+^/P, pn = Cii{Lp/ny-Vs+VP = Cn{L^ann-^y . 

We remark that — t- 00 as n — t- 00. It remains to check that cj/, I = 1, . . . ,d are small enough. It 
follows from (7.34) that if ri > 1, then o"; — )• as ?i — )• 00 since A — )■ 0. If r/ = 1, then 

Choosing Cg small enough we guarantee that o"; < {20>c)~^, for all I = 1,. . . ,d. This condition 
is required in the construction of the family Gm, m £ Ai. Thus, Lemma 4 can be applied with 
Pn = Cn{L/3ann~^Y , and the result follows. 



7.3.2. Dense zone: <p<s{2 + j) 

Here, as in the previous case, we let m = The relationships (7.34) (7.35) and (7.36) remain 
to be true, but our choice of N will be different. 

Let A'' = C12 from some constant Cu. This yields in view of (7.35) and (7.36) 

A = Ci^{Lp/n)W+-i^ p„ = Cu{Lp/n)W+-i = CiA{Lpann-^y . 

The requirement (7.23) is obviously fulfilled since A — )• 0, n — t- 00. Moreover, we obtain from (7.34) 
that o"; —7- as n —7- 00 and, therefore, ai < {20>c)~^, I = 1, . . . ,d for n large enough. Thus, Lemma 4 
can be applied with pn = Ci4,{Lpann~^Y ^^^d the result follows. 
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7.5.5. Sparse zone: s(2 + -g) < p < oo, s < 1 

Let A = C and N = Cn and suppose that C < C^jx'^; then (7.23) is satisfied. Moreover (7.31) 
and (7.32) are reduced to 



nllaj < C~^Ci8, <C~'CigLi, yi = l,...,d. (7.37) 

j=i V j=i / 



Let ci, C2 be constants satisfying ci < C ^Cis, and C2 < C ^Cig. It is straightforward to check 
that if we choose 

m = c^^+'cj^L^n^-', ai = (caLi)-!/^ (cicj ^ L pn-^'^ '^^^"''^ , l = l,...,d, (7.38) 

then inequahties (7.37) are fulfilled. With this choice (7.26) is reduced to 

d . i/p 



V ,=1 / 



(7.39) 



It remains to verify that ct/ are small enough, and that m > 8, |A^|/m > 1. Note that m — )• oo as 
n — 7- oo because of s < L Remind also that 

d d 

\M\ =J\Mi = (20x)-'^iV^f]afi = {2^K)-'^CfjCin; 
1=1 1=1 

hence \M\/m > {20:n:)~'^CfY{ciC2^^)~'^LQ^^^n^. Thus \M\/m > 1 for large enough n. We note 
also that ai < {c2Lo)~^/^' for all n large enough. Therefore, if we choose C large enough and put 
C2 = C~^Cig we can ensure that cr/ < (20x)~^ for all / = 1, . . . ,d. Thus, Lemma 4 can be applied 
with Pn = CC2o{L^ann~^Y ^-^^ the result follows. 

7.3.4- Sparse zone: s{2 + ^) < p < oo, >s > 1 

Here we consider another choice of the set W. Let W = {ei, 62, . . . , e|_A/(|}, where ej, j = 1, . . . , \M.\ 
is the canonical basis in rI^^L With this choice 

d 

Sw = l, \W\=N''lla-\ 

i=i 

and (7.24) holds with S = 1. Let = C14; then (7.18) and (7.28) take the form 

/ \i/n 

(n^i) ^ (2Ci)-iLz, V/ = l,...,d; (7.40) 



d d 

A^nY[a, < ^^i5ln(n^7')- (7-41) 
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Moreover, we get from (7.26) 



Ci6^(n^i) (7-42) 



Put e = ^In n/n and 

1 i-i/s '--^/n i_i/s+i/(^rj) 

A = ciL^-'''^+'/''ei-V»+i/(2/9), a; = c2L^'<'~'''^+''''"e''^(i-V»+i/(2/3))L-^/^'. (7.43) 



We have 



, ^ 1//3 



and it is evident that Y[f=i — e^^^^^^^'^'' for all n large enoug h; hence ln(nf=i (rf'^) > lnn/(2/3+l). 
Then is is easily checked that our choice (7.43) satisfies (7.40) and (7.41) provided that 

ci < (2Ci)-\ C2 < 1 44 < Ci5/{2(3 + 1). (7.44) 

Here we have also used that d — 1/s > 0. Note also that if s > 1 then 

A — )• 0, max o"; — )• 0, n — )■ oo, 
1=1,. ..,d 

which ensures (7.23) and ai < {20>c)~^, I = 1, . . . ,d for all n large enough. 
On the other hand, if s = 1 then we should add to (7.44) the conditions 



1 

7- 2-2/3+1/13 ^ r-l a, 

ciivfl < Ciax , Co max 

P 1=1,. ..,d 



■ l//3i-2/(/3jr;) 
Y 2-2/S + 1/I3 J -1/A 
^13 ^0 



< (20x)~^ 



Obviously, both restrictions hold if we choose ci and C2 small enough, but now these constants may 
depend on L. Note, however, that if max^^i ,i < Lqo then ci and C2 can be chosen depending 
on Lq and Lqo only. 

Using (7.42) and (7.43) we conclude that Lemma 4 is applicable with 

^ c„Lf^ ^ ^^^^^mif^-^ ^L^y ,,,,, 

that completes the proof of statement (i) of the theorem. 
7.3.5. Proof of statement (ii): sparse zone, p = oo, s < 1 

The proof in this case coincides with the one for the sparse zone with s < 1. Thus, we keep (7.37), 
(7.38), and, in view of (7.27), (7.39) is replaced by pn = CCn. Since pn does not tend to as 
n —7- oo, a consistent estimator does not exist. All other details of the proof remain unchanged. 
This completes the proof of Theorem 3. ■ 
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l.Jf.. Proof of statement (ii) of Theorem 4 

The proof goes along the hnes of the proof of Theorem 3 with modifications indicated below. 
We start with the following simple observation: for any M > and y > one has 

Iblloo < M, suppM C [-y,y]d' ^ ge Ge{M {2y + 4)^/') , yO G (0,1). (7.46) 

This is an immediate consequence of the fact that conditions \\g\\oo ^ M, supp{(7} C [—y, y]'^ imply 
that ||fif*||oo < M and suppj^} C [-y-2,y + 2f. 

Next, we note that the lower bounds of Theorem 3 in the dense and sparse zones are proved 
over the set of compactly supported densities. Hence they are valid also on G0{R) n N^^d(/3, L, M), 
provided that R is large enough. Hence, if p > y^j^^y^ the assertion of the theorem follows. 

Let p < y^^y^ ■ The proof of the lower bound here differs from the proof of Theorem 3 only in 

construction of the function f^^\ 

Let /(^) be the function constructed exactly as in the proof of Theorem 3 with N = Nq fixed 
throughout the asymptotics n — )■ oo, and such that /'-''^ S N^^rf(/3, 4~-^Lo, 4~^M). Since A^'o is fixed, 
is compactly supported and, by (7.46) we have that /(o) G Ge{Ri) for some large enough 
Ri > 0. Define 

1=1 

where A'" = N(n) — )• oo will be specified later. Let f^^\x) = (;'^f^^\qx), where ? > is chosen to 
guarantee /^^^ G N^^(i(/?, 4~^Lo) 4~^M). We note however that, in contrast to the case 9 = 1, f^^^ is 
not a probability density. In particular, J f^^^ — )• as — )■ oo, because 6 < 1. Define 

/(') = (l-p^)/(°) + /('\ 
where p^ := J f^^^ ensures / /^^^ = 1. Thus, we can assert that 

f^''^ G n,.,0,2-'Lo,2-'M), I = 1, /(^) > 0. 

Note that, by construction, /(^^ is supported on the cube [{—N/2 — 1)/?, {N/2 + 1)/?]'^ and bounded 
by N-'^/\'^. Therefore, m view of (7.46), /(^) G Ge{R2) for some large enough i?2- 

Let W be the parameter set as defined in the proof of Theorem 3. For any w and any 9 <1 
we let 

f';!!\x) = f^'\x)+F^{x), xGM^ 
where functions F^, are constructed as in the proof of Theorem 3. If instead of (7.23) we require 

A < [x'^ + <j^]iV^'^/^ (7.47) 

then we obtain in view of (7.11) and (7.47) that {Fyj, w G W} C Gq{R^) for some large enough R^. 
All said above one allows to conclude that {f^^\ fw \ w G W} is a finite set of probability densities 
from Gg{R) H N^^(i(/3, L, M) for some large enough R > 0, and Lemma 4 is applicable with J7n = 
and F = Ge{R) n RF,d(/3, L, M). 
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A^" 



^1/9 



Ky-xi)h 



JV JV 
' 2 ' 2 



(y)dy 



(xi, . . . ,Xd) G 



We will follow construction of the set W for the tail zone which is given in Subsection 7.3.1. 
Choose m = \A4\ and note that (7.33), (7.34), (7.36) remain unchanged, while (7.35) should be 
replaced by 

^2+i//3^d(i/e+i/s) < c7Lpn-\ (7.48) 
Now we choose N'^ = cA~^ with c < x'' + ■j'^; then (7.47) is vahd. We obtain from (7.48) that 

A = Cs{L^/n)^-^/'+^/P, pn = Cg{Lp/ny-o/%P . 

Finally, because (7.34) remains intact, cr; — )• as n — )■ oo for any I = 1, . . . this follows from 
A —7- and 9 <1. This completes the proof. ■ 



7.5. Proof of the lower bound in (4-4) 

The required result will follow from the lower bound of Theorem 3 in the tail zone (see Section 7.3.1) 
if we will show that for any given R > 

/(°)^Ge(i?0, f^(^Ge{R), e W. (7.49) 

First we note that /(o) = iV"'^ for x G [-(iV - 2)/(2x), {N - 2)/{2k)Y; therefore, ^ oo 

as — oo, because 6 <\. 

Next, in view of (7.21), /^(x) = f^^\x) for any xi[-{N- 4)/(4x), {N - 4)/(4x)]'^, which also 
implies 

It remains to note that in the tail zone the parameter N is chosen so that N = N{n) — t- oo as 
n — )• oo. This completes the proof of (7.49). ■ 



Appendix A: Proofs of auxiliary results of Section 5 
A.l. Measurability 

Write f{x,X^^^) := fj^(^^-^{x), and note that the map / : M*^ x M" — )• M is completely determined by 
the kernel K and the set %. We need to show that / is a Borel function. 

Let i?/,(x,X(")) := Rh{x), and note that for every h e Ti, the map Rh : M'^ x M" M is a 
continuous function. This follows from the continuity of the kernel K and from the fact that ^ is a 
finite set. The continuity of K also implies that the map fh : x — )• M is a continuous function 
for any h GTi, where fh{x,X^'^^) := fhix)- Next, denote by ^ the Borel cr-algebra on M'^ x M", and 
let 6 : M'' X R" — )• 7^ be the function b{x, X^")) := h{x). We obviously have for any given h £ Ti 

{(x,y) G M'^ X M" : b{x,y) = /i} = |J {(x, y) G M'^ x R" : Rhix,y) - R^{x,y) < o} G 53, 

where the last inclusion follows from the continuity of Rr^, rj £ T-L. Here we have also used that T-L 
is a finite set. It remains to note that 

%)(^) = E fh{x,x(^^)l{h{x,x(^^) = h}, 
hen 

and the required statement follows. 
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A. 2. Proof of Lemma 1 

1°. Note that Mh{g,x) = 4-'^Mh{g,x) and let no = {hen: Ah{g,x) > Ax In n/{nVh)}. 
For any h G T-Lq we have 



>cAfi{g,x)lnn ^ 2xlnn 



Mh{g,x) < ■^Ah{g,x) 



y nVh nVh 
Therefore, 

\Ah{g, x) - Ah{g, x)\ < Xh{g, x) + Mh{g, x) < Xhig, x) + i3/A)Ah{g, x). 
We have for any h G Hq 



\Mh{g,x)- Mhig,x)\ 



'xlnn Ah{g,x) - Ah{g,x) 



r^Vh A];\g,x) + A]l\g,x) 



< 



xhin I Xh{g,x) + {2>/A)Ah{g,x)\ I ^ , ^ , , , . 

-^TT- YJT, ^ - o^h{g,x) + -Mh{g,x). 

nVh \ AY\g,x) J 2 4 



It yields for any h G Ho 



[Mhig,x) - '^Mh{g,x)]^ < ^Xh{g,x), [Mh{g,x)-AMh{g,x)]^ < 2xh{g,x). (A.l) 



(b). Now consider the set Tii := 'H\T-Lq. Here Ah{g,x) < Axln n/(nVh), and, by definition of Mh 
we have 



1 , , , xlnn , , Sxlnn 

-;Ah{g,x)<——<Mhig,x)<——, yheTii. 
4 nVh nVh 

Note that we have Mh{g,x) > >clnn/{nVh) for all h. This together with (A. 2) shows that 

[Mh{g,x)-3Mh{g,x)U = 0, yheUi. 

Furthermore, for any /i G "Hi 

7x Inn 



Ah{g, x) < Ahig, x) + xhig, x) + Mhig, x) < xhig, x) + 



Therefore 



Mhig,x) 



KAh{g,x)lnn ^ xlnn ^ 



nVh 



nVh 



xXh{g,x)lnn ^ ^^xlnn 



nVh 



nVh 



(A.2) 



(A.3) 



< \^h{g,x) + {y7 + ^^^^ < ^xhig,x) + (^V7+^yUg,x). 
To get the penultimate inequality we have used that y^ja^l < 2~"^(|a| + Thus, it is shown that 



Mhig,x) - [V7+-jMhig,x) 



<^Xh{g,x), yhel-Li. 



(A.4) 



Relations (A.4), (A.3) and (A.l) imply statement of the lemma. 
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A. 3. Proof of Lemma 2 

l". Let 5r : R"^ — )• be a fixed bounded function, and let 

1 " r 
ih{g, x):=-Y, 9h{Xi -x)- Qhit - x)f{t)dt, hen. 

i=i •' 

With tliis notation £,h{x) = Chi^, x) and Ah{g, x) — Ah{g, x) = ^h{\9\ix). Tlierefore moment bounds 
on Ciix), (six) and Ciix) will follow from those on (,h{g,x) with substitution g £ {K,Q, \K\, \Q\}. 
Since Mh{g,x) depends on g only via \g\ and Halloo [see (2.5)-(2.6)], Mh{g,x) = Mii(\g\,x), and 
moment bounds on Ci{x) and C3(^) are identical. The bound on C^ix) will follow from bounds 
on Ciix) and Csix) with only one modification: kernel K should be replaced by Q. As for C2ix), 
cannot be represented in terms of x) with function g independent of h and rj; see (2.3). 
However, the bounds on C2{x) will be obtained similarly with minor modifications. Thus it suffices 
to bound E/[Ci(x)]'' and E/[C2(x)]''. 

2^^. We start with bounding Ej[(^i(x)]''. For any z > 0, h £ Ti and q > 1 one has 



' 2k^Ah{K,x)z Ik^zV <2r((? + l) 
nVh 3nVh _l ~ 



1 2k^ Ah{K,x) ^ 2ke 



nVh 3nVh 



(A.5) 



This inequality follows by integration of the Bernstein inequality and the following bound on the 
second moment of (,h{x)- 



nVh J nVh 

We will show that Ej[Ci(x)]'? is bounded by the expression appearing on the right hand side of 
(5.2). In fact, we will prove a stronger inequality. Let for some / > 

A;, := [l + q] ln(l/y/,) +ln {F~\x)A71^). 

In suffices to show that (5.2) holds when in the definition of Ci{x) the quantity Mh{K,x) replaced 
by Mh {K, x) , where 

y nVh SnVh 

Indeed, since and n~'^ < V/j, < 1 for any h £ T-L, we have that 

A/i < (q + l)dliin + l\nn. 

Therefore Mh{K,x) < Mh{K,x) for all x G M'' and /i G H provided that 

X > koo [d{2q + 4) + 21] . 

Thus if we establish (5.2) with Mh{K,x) replaced by Mh{K,x), the required bound for Ej[Ci(x)]'' 
will be proved. 

We have for any h £% 

eM-\h} = {Vhy^\F{x)\Jn-']. 
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Furthermore, taking into account that Ah{g,x) < "^llslloo for any g, we obtain 

1 2koo Ah{K,x) 2kon 2K 



nVh 



+ 



< 



3nVh \/nVh ' 



Here we have used that n > 3. If we set z = Xh then (A. 5) together with two previous display 
formulas yields 



%[Ci(^)]^ 



Ej sup 
hen 



Mx)\-MhiK,x) 



< 



Y,^.f[M^)\-Mh{K,x) 



hen 



hen 



< 2'^+ir(g + l)(2koo)''n-9/2{F(x) Vn-'}. 



(A.6) 



As it was mentioned above, under the same conditions inequality (A.6) holds for Ej[(^3(x)]'^. 

As for the moment bount for C4(^)) iii formulas above K should be replaced by Q and koo by 
k^ since ||Q||oo < k^. Specifically, if x > k^[d(2g + 4) + 21] then 

^fMx)Y < 2'^+^r(<7+l)(2ky^n-'?/2{F(x) Vn-'}. 
3*^. Now we turn to bounding Ej[("2(2;)]''- We have similarly to (A. 5) 



Ef 



\^h,r,ix)\ 



l2kl^AhVr,{Q,X 



2k^z 



hVrj 



3nVi 



hyr) 



< 2r{q+l) 



l2kl,Ahvr,{Q,x) , 2k2 



+ 



nVhvr, 3nVh\/n 
Here we have used the following bound on the second moment of £,h,rjix)' 



%l6^,r,(^) 



< 



< 



||Q/i,r?||c 
WQWo 



hVri 



\Qhvv(i-x)\f{t)dt 



fit)dt 

^lo^hVr,(.Q,x) 



nVh 



hVri 



The further proof goes along the same lines as the above proof with the following minor modi- 
fications: in all formulas koo should be replaced with k^, Vh\/rj should be written instead of Vh, 
and X should satisfy x > k'^[d{2q + 4) + 21]. The statement of the lemma holds with constant 
Co = 2'^^+^T{q + l)(2k^)9. Combining the ab ove bounds we complete the proof. | 



Appendix B: Proofs of auxiliary results of Section 6 
B. 1 . Proof of Lemma 3 

We have 



Bhif, x)= I K{u) [f{x + uh) - fix)] du= jW w,{uj) [f{x + uh) - /(x)] du 
J J j=i 
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First, we note that f{x + uh) — f{x) can be represented by the telescopic sum 

d 

f{x + uh) - f{x) = ^ Au^h^jf{xi, Xj,Xj+i + Uj+i/ij+i, ...,Xd + Udhd), (B.l) 
i=i 

where we put formally /id+i^^d+i = 0. 

Next, for any function g : M'^ — t- M"*^ and j = 1, . . . , d we have 

= (-l/~'/ w{z) J2Q{-iy^'\,,,^,g{x)dz = i-iy-' I w{z)Ai,^^^g{x)dz. (B.2) 

The last equality follows from the definition of ^-th order difference operator (3.1). Thus (B.2) and 
(B.l) imply that Bhif,x) = Yfj=iBh,j{f ,x), where 

d 

w{z)AIi^.j f{xi, . . .,Xj,Xj+i + Uj+ihj+i, ...,Xd + Udhd)dz Wl>{Um)dUm- 

m=j+l 

Therefore, by the Minkowski inequality for integrals [see, e.g., (Folland 1999, Section 6.3)] 

\\Bh,j{f^-)\\r, 

< / \w{z)\ AIi^^ j f {■,...,-,■ +Uj+ihj+i, ...,■ + Udhd) dz Y\ \we{um)\dum 



m=j+l 



j j \w{z)\\\A[^^^^ f\l,dz \\ \wi{um)\du 



m=j+l 

Since / G Nf?^rf(/3, L) one has 



\BhMr)\l^< Ljhf J J \w{z)\\zf^dz II \weM\dun, < CiLjhf 

m=j+l 



This proves (6.6). To get (6.7) we first note that the condition s > 1 implies r(p) > and Tj > 0, 
j = 1, . . . Then the inequality in (6.7) follows by the same reasoning with Vj replaced by gj, 13 j 
replaced by 7j and with the use of embedding (6.2). | 



B.2. Proof of Proposition 1 

By definition of Jm and Af^, 



J„, < 2f(-+i)99P|^^|, (B.3) 
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and now we bound from above | \ ■ By definition of we have for any h G H 

d 



{x : supM,(x) > 2— V} + E ^ B*^Af,^) > 2"^"V} 



r]>h 



= : Jm,l{h) + Jrn,2ih). 



Recall that with the introduced notation 



For any h £ T-L we have 



= : J^W + i'lW- 



By the Chebyshev inequality and (6.12) for any h 

i'iw< E [2(™-'V]~'iis;:,,(/,-)ii::^: < ci 



E 2 



-'•j"^ip-'-:>L''/hf''\ 



(B.4) 



(B.5) 



(B.6) 



In addition, if s > 1 then the Chebyshev inequality and (6.13) yield 



In order to prove statements (i)-(iii) of the proposition we bound quantities Jm,i{h) and Jrn,2{h) 
with bandwidth h = h[m\ specified in an appropriate way. 



B.2.1. Proof of statement (i) 

1*^. We start with bounding the term Jm,i{h) on the right hand side of (B.4). Assume that h £ Ti 
is such that 



x5Vr^ < 2™-V; 



(B. 



then by the Chebyshev inequality 



Jm,i{h) < \{x:snpJKA^{x)6Vf^>2"'-^ip} 



ri>h 



rf>h 



where we have taken into account that, for any rj, \\A^\\g < R if f £ Gg{R) with ^ < 1, and 
ll^r^lli < k^- By definition of H, for any r] > h, rj £ n, we have Vr, = 142^1+-+*^'^ for some 
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ki,...,kd>l, which imphes that J^r^^h ^7"^ < (1 - 2-'')-^V;7^ Thus, we conclude that for any h 
satisfying (B.8) one has 

Jm,i{h) < C3 (2-2-(^-2jy-i)' . (B.9) 
2°. Let h = {hi, . . . , ha) G (0, oo]'^ be given by 



m f -I 0(2 + 1/13) 

= (c4LtV)^/^^2^A j = l,...,d, (B.IO) 

where constant C4 will be specified later. Let us prove that h £ [n~^, l]'^ for large enough n. 
Denote 

_ l-e/s + l/P _ g(2 + l//3) 
" 1 + 0/5 ' ' r,(l + l/s)' 

and remark that a > 0. We note also that 
If 6j < then, because m < 0, 

for all large enough n. On the other hand, since 0>m>mo{9) and 2""«('^)'' < 2"'ci>cip by definition 
of mo (6*), 

hj < {ciL-\f/l^^2 = (c4L-V)^/^^(2™"(^)'')'"'^- 

i)j l + i)j/a 

< (c4L7i)i/^^(2-"cix)^(^^^ < C5(c4Lol)V/3^ 

where we took into account that l + bja~^ > and minj=i^...^(i > Lq > 0. Then choosing constant 
C4 small enough we have hj < 1. Thus we showed that hj £ [n~^, 1] for j such that bj < 0. 
Now consider the case bj > 0. Here 

(2+l/,a)(l-fl/rj) 

It remains to note that 

_l/^j-e/{P,rj] 



f3j{i-e/s + i/i3) i-e/s + i/(3 



< 1, Vj = l,...,(i, 



in view of the obvious inequality 1//5 — 9/s > l/f3j — 9/{f3jrj), which, in its turn, follows from the 
fact that 6 G (0, 1]. Thus, we have that hj > n^^ for all large enough n. Furthermore, if bj > then 
since m < 

hj < (c4L-i)i/^^(/ji/^J < 1 

for all large enough n. Thus we have shown that h G [n~^, l]"^. 
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3^. Now we proceed with bounding Jm,2{h) for a specific choice of h = h[m], which is defined as 
follows. Let h[m\ G Ti such that /i[m] < h < 2h[m]. Let constant C4 in (B.IO) be chosen so that 
C4 < (2ci)~^, where ci appears on the right hand side of (6.12). With this choice of C4 by (6.12) 

P;:,H,,(/>-)lloo < ciL,{h,[m]f^ < ciL.h^/ < 2™-V. 

Therefore, •^^2(^1'^]) ~ 0' where J^\{') is defined in (B.5). Moreover, we obtain from (B.6) and 
h[m\ < hj that 

, > r. f 2+1/13 \ ( 2+1/13 \ 

Jrn,2{Hm]) < Cl 2'-^"'ip~'-^L''/hy"' <Ci J2 ^ ""^^^^ ^ < C'jl~'^\^^^l . (B.ll) 

/\-foo je/\/oo 

Note that 

yh{m\ > = 2-^cy^L^V'/^2"(^- . (B.12) 

This together with (B.9) yields 

_ f 2+1/13 \ 

Jm,i{h[m]) <cs2 ""y^To+TT^J. (B.13) 
Then it follows from (B.ll) and (B.13) that 

Jm,i{h[m]) + Jm,2{h[m]) < cq2-"'^^^) , 
which combined with (B.3) results in 

Jm <Cio2™AP-I7e+I7ii(pP. 

Inequality (B.3) is valid only if (B.8) is fulfilled for h[m], i.e., X'^V'^^^j < 2™'~^(/j; now we verify 

this condition. It is sufficient to check that k62^V^^ < 2'"~^(^. In view of (B.12) this inequality 
will follow if 

cy^99i+V/32'"('+^- > 2'^+2x(L^5). 
Taking into account that Ljjd = (^2+i//3 we conclude that (B.8) is fulfilled for h[m] if 

( i-e/s+i/^ \ 

which is ensured by the condition m > mQ{9). This completes the proof of (6.23). 
B.2.2. Proof of statement {\\) 

l". Let C4 be a constant to be specified later, and let C4 be the constant given in (B.IO). Let Cj = C4 
if j £ -^oo and Cj = C4 if j £ I \ loo- Define h = {hi, . . . , hd) G (0, 00]'^ by the formula 

/ 1 3(2+l/;3) '\ 

hj = {CjL-\f'^^2'''^f'3 Pj^j \ j = l,...,d. (B.14) 

Note that if j G Jqo the corresponding coordinates of h given by (B.IO) and (B.14) are the same. 
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Let us show that /i G [n ^,1]*^ for large enough n. First consider the coordinates hj such that 
1 — ^(2 + > 0. Because m > we have for ah n large enough 

where we have used the obvious inequality /5j//3 > 1 for any j = 1, . . . On the other hand, 
because 2™ < C2^~^ we obtain 

s(2+l/;3) 



hj < cii (c4L-i)Vft ^ 0, n ^ oo, Vj G / \ loo; 

s(2+l//3) 

hj<cii{C,L-')^/^^^ "^'■^ <Cll(c4Lol)l/^^ ViG/oo. 



Thus hj < 1 for large enough n if j G / \ /qo, and /ij < 1 by choice of constant C4 if j G /qo- 
Now consider the case 1 - ^(2 + < 0. Since 2™ < C2(/?~^ 

for all n large enough. Here we have used the obvious inequality 1/s > 1/ PjVj Vj = 1, . . . , d. On 
the other hand, since m > 0, hj < {CjL-J^ifY'^i < 1 for large enough n. Thus we have proved that 
h G [n~^, for all large enough n. 

2^. Let h[m] G Ti such that /i[m] < h < 2h[m] and choose constant C4 satisfies C4 < (2ci)~^ [see 
(6.12)]. Recall that formulas (B.IO) and (B.14) coincide for j G loo- Therefore, as before, with the 
indicated choice of C4 we have 

i'U^H) =0. (B.15) 

Let (3± and [Sqo be defined by expressions l//3± := J2jei+ui- ^/f^j f/Z^oo := Z^je/^ l/Z^j- We 
have 

V^MH > = l-^^^-cl/^^L-^'^'/^l-^^. (B.16) 

This together with 2™ < 029?"^ shows that Vh[m] > cisc]/^^ L'j^^tp^'^^/^ = cisc\^^^5 and, therefore, 

>^6V~^^^<Cr,'c~'^^^>c. (B.17) 
Remark that A;,(2;) < 2'^Mk^ for all x G M'^ and ij £V.. Hence, in view of (B.17) 



sup M,{x) < koo^/¥Mj^5V^yl^+>c6V~^\^ 

r)>h[m\ 

It yields together with (B.16) 

supM,(x) < C15 c-i/(2fe)2-^(L;35)(^-i//3 = C4-^/('^±)2-(^. 
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Setting C4 so that ci^c^^^^'^^'^^ < 2 we obtain sup^>^M^(2;) < 2"^ ^99. This imphes that 

Jm,i{h[m]) = 0. 

Moreover, it follows from (B.6) and from inequality h[m] < h that 



(1) 



2-ms{2+l//3) 



(B.18) 



(B.19) 



Then (6.24) is a consequence of (B.3), (B.15), (B.18) and (B.19). The statement (ii) is proved. 
B.2.3. Proof of statement (\\\) 

1°. Let Cj, j = 1, . . . ,(i be the same constants in the proof of statement (ii) in the previous section. 
Define h = [hi, . . . , hd) G (0, oo]'' by the following formula 



hi 



^( J ^-(2+l/7) ^ 



j = 1, . . . ,d, 



(B.20) 



where 7j, are defined in (6.1) and 7, v and are given in (6.15). 
Let us show that h G [n~^, 1]*^ for large n. Let 6j = 1 — ^(2 + I/7). 

First, assume that bj < 0. Since m > and 2™ < C2<^~^, 



/ij >Cl6(CJL-l)V7.(^- 



li(2+l/7) 



3 ^] 



V v(2+l/l3) 



ci6(CjLTi)V7, [L^]^j >5> n~\ 



where we have used the obvious inequality 1/v > l/i'yjQj) for any j = 1, . . . ,d. On the other hand, 
in view of m > mi and by (6.20) 



11 Oj 



.l/7,,i(l+[l-t(2+l//3)]^T5f0f^^;aW) 



mi I 



< (CjLjV)^^'^'2~ 2™i["(2+i/7)-s(2+i//3)] 
Then by (6.21) 

hj < cnCi{L)C;''''v 

where the expression for constant Ci(L) is easily found. It remains to note that 

vi2 + 1/7) - s(2 + = sv [(2 + l//3)(l/s - 1/v) + (I/7 - l//?)^-^ 

and in view of (6.16) and (6.17) 

1 [Qj - s{2 + l/f3)]v{l//3 - 1/7) ^ 2 + 1//3 [ (1/. - l/v)qj + (I/7 - 1/13) 
qj[v{2 + l/^)-s{2 + l/f5)] qj 

This shows that hj < 1 for large n. 



J «(2+l//3) ' 

7,- liQi t 



(B.21) 



(l/s-l/t;)(2 + l/7) + (l/7-l//3)i;- 



> 0. 
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Now assume that bj > 0. Then, similarly to the reasoning that resulted in (B.21) we have 



Since (^2+i//3 = i 



(l/a-l/t,)(l/7j) + (l/7-l/;3)(l/7jrj) 

hj > ci8Ci{L)Cy^' 6 (i/»-i/«)(2+i/7)+(i/7-i/«--i >5>n-'^ 

for all n large enough. Here we have used (6.16), (6.17), and obvious inequalities: 2 + I/7 > 1/7^- 
and 1/v > l/jjVj for all j = 1, . . . , d. On the other hand, since 2"^ < C2ip~^ 



hj < ci9(CjL-i)V7.(^ 



"(2+1/7) 



ci9(C7,Lt1)1/7, 



u "'(2+l/<3) 



Therefore, hj — > 0, n — )• 00, \fj & I \ loo, and hj < cig (c4L^^)^/7j ^ \/j G /qo- Choosing C4 small 
enough we come to required assertion. 

3^. Let h[m] £ Ti he such that h[m] < h < 2h[m], and let constant C4 satisfy C4 < (2ci)~^, where 
ci is given in (6.12). With this choice of C4, if j € loo then the corresponding coordinates of h given 
by (B.14) and (B.20) coincide. Hence we have as before 



Let :^ := Ejei+ui^ t^en 



-2m 



(B.22) 



(B.23) 



We remark that (B.23) and (B.16) coincide up to the change in notation /3± ^ 7±. Hence all the 
computations preceding (B.18) remain valid, and we have as before 



Jm,i{h[m]) = 0. 



Moreover, we obtain from (B.7) 



L^(^V7 



<2—'mv{2-{-l / ■y) 



(B.24) 



(B.25) 



The bound given in (6.25) follows now from (B.3), (B.22), (B.24) and (B.25). 



B.3. Proof of Proposition 2 

In view of (2.13) and (6.11) 

\f{x) - f{x)\ < co[Uf{x)+io{x)] < ci[Uf{x)+u;{x)], (B.26) 

where cq and ci are appropriate constants, Uf{x) and Uf{x) are given by (6.8) and (6.10) respec- 
tively, and a;(x) := C{x) + xix) with (^(x) and xix) defined in (2.14) and (2.15). 
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< C2 



B.3.1. Proof of statement {}) 

Here for brevity we will write mo = mo(l). By (B.26) 

\f{x)-f{x)\Pdx < cP-' f [Uf{x)+u{x)Y-'\f{x)-f{x)\dx 

(2"^VF"^ [ \f{x) - f{x)\dx+ [ ujP-\x)[2"'"^ + uj{x)]dx . 
Noting that ||/||i < ||K||i < 

kooi we have ||/ — < kco + 1 and, therefore, 
/ |/(x)-/(x)|dx < (koo + l)(2^'"Vr-'- 

jRd 

Moreover, since x = k^[(4(i + 2)p + 4((i + 1)], the second statement of Theorem 1 implies 

Ef [ w?'-i(x)[2"*V + w(x)]dx < C3(2™V)™^^^"^^^^ +C4n"P/^ 
Combining these inequalities and taking into account that 2'"0(/? < 1 we obtain 

J-^ =Ef f \f{x) - f{x)\Pdx < C5 [(2"^^)^"^ + 2'^°ipn-'^P~^y^ + n-^/^j 

By definition of mo = mo(l), 2"^V < C6(L/3(5)^''^^+^^''"^/^^; therefore 
It remains to note that for large n that 

and (6.26) follows. 

B.3.2. Proof of statement {\\) 

Let /* be the maximal operator of / defined in (4.1). It follows from the definition of -/Vf^(x) that 
for any h £ Ti 



>if*(x)lnn xlnn 
SUpM^ X <C8W^^^T^ + — f^- 

Moreover, by definition of Bh{f,x), Bh{f,x) < C9[/*(x) + f{x)] < 2cgf*{x) almost everywhere, 
where the last inequality follows from the Lebesgue differentiation theorem. Using these two in- 
equalities and setting /i = (1, . . . , 1) in (6.8) we come to the following upper bound on Uf{x) 



Uf{x)<ci^[r{x) + ^/T^5 + 5\. (B.27) 
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In view of (6.11) we have that X'^^^q^ ^ X" := {x G M'^ : Uf{x) < koo2™o(^V}; therefore if we put 
Di := X- r\{x eM!^ : f*{x) < 6}, D2 := X'nixeR"^ : f*{x) > 6} 



then 



J-„(,)<1E/ /_ \f{x)-f{x)\Pdx + Efl_ \f{x)-f{x)\Pdx=:EfSi+EfS2. 



(B.28) 



D2 



We bound from above the two terms on the right hand side of the above inequahty. 
First consider E^^i. By (B.26) for any 6 G (0, 1) we have 



Si 



Di 



|/» - f{x)\Pdx < d^-' I [Uf{x)+u;{x)Y-'\f{x) - f{x)fdx 



Di 



< ciJdP'^ [ \f{x)-f{x)fdx+ [ ujP-'^{x)[6 + uj{x)fdx 

Here we have used that, by (B.27), Uf{x) < 2cio5 for ah x £ Di. Remind that /(x) = f^^-^{x); 
therefore, for any 6 G (0, 1) 



Ef\f{x)f < {Ef\f{x)\y < [Y.^f\h{x)\) < cu[{lnnmx)Y . 

h&H 

Thus, for any / G G0{R), 

6P-% [ \f{x)-f{x)fdx < 6P-'\\\f\\1> + cu{lnny'\\r\fA < d^'-' R\\nnf' . 
Furthermore, because x = k^[(4(i + 2)p + 4((i + 1)], by the second statement of Theorem 1 



E 



/ ujP~\x) [5 + uj{x)fdx < ci4<5%"(P~^)/2 + ci5n-P/2 < ^sn' 



-p/2 



Combining the last two inequahties we obtain 

EfSi = Ef [ \f{x) - f{x)\Pdx < cie[SP~'R'{ln7^)^' + 

JDi 

Now we proceed with bounding EjS'2. We have 



n 



(B.29) 



EfS' 



f02 



= E/ 

(a) 

< Cl7 
(b) 



|/(x) - f{x)\Pdx < eg Ej / [C7j(x) +a;(x)]^dx 



D2 



D2 



D2 



\Uf{x)fdx + n 



-p/2 



< cn[{2"'°^^'^ip)P-^R^ + n-P/'^] < cig + n-^/^] . (B.30) 

Here (a) follows from the second statement of Theorem 1 and Uf{x) < 2^"^^^ip for x G D2, and (b) 
is valid because Uf{x) < 3cio/*(x) for all x G D2, see (B.27). 

Combining (B.29) and (B.30) with (B.28), and taking into account that ,5^"i-e/«+i/,8 (Inn)''^ 
as n — )• 00 whenever < 1 we obtain 



p-e 
i-e/s+i/ft 



+ n 



< 



C2o{Li3S) 



pv{e) 



as claimed. 
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